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Group Art Unit 
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Examiner Name 
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Attorney Docket Number 
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Title: METHODS OF CLONING AND PRODUCING FRAGMENT CHAINS WITH 
READABLE INFORMATION CONTENT 



PRELIMINARY AMENDMENT 



Assistant Commissioner for Patents 
Washington, D.C. 20231 

Dear Sir: 

Please enter the following amendments before calculation of the filing fee and 
examination the merits. 

IN THE CLAIMS: 

Please amend claims 8-13 as follows: 

8. (Amended) A method as claimed in claim 1, 2 or 3 wherein said fragments are each 
between 8 and 25 bases in length. 

9. (Amended) A method as claimed in claim 1, 2 or 3 wherein n is at least 10. 

10. (Amended) A method of synthesizing a double stranded nucleic acid molecule 
comprising at least the steps of: 

1 ) generating fragment chains according to the method defined in claim 1 , 2 or 3; 
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2) optionally generating single stranded regions at the end of said fragment chains, 
wherein said single stranded regions are complementary to the single stranded regions 
on said fragment chains thus forming complementary pairs of single stranded regions; 

3) contacting said fragment chains with one another, simultaneously or consecutively, 
to effect binding of said complementary pairs of single stranded regions. 

11. (Amended) A nucleic acid molecule produced according to a method as defined in 
claim 1, 2 or 3, or a single stranded nucleic acid molecule thereof. 

12. (Amended) A method of identifying the code elements contained in a nucleic acid 
molecule prepared according to a method as defined in claim 1, 2 or 3, wherein a probe, 
carrying a signaling means, specific to one or more code elements, is bound to said nucleic 
acid molecule and a signal generated by said signalling means is detected, whereby said one or 
more code elements may be identified. 

13. (Amended) A library of fragments as defined in claim 1 , 2 or 3, comprising (n) m 
fragments, wherein n is as defined in claim 1, 2 or 3 and corresponds to the length of chain that 
said library may produce, and m is an integer corresponding to the number of possible code 
elements or combinations thereof, such that fragments corresponding to all possible code 
elements for each position in the final chain are provided. 



IN THE ABSTRACT 

Please add the following abstract on the accompanying separate sheet. 
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REMARKS 



The accompanying amendments are being made to eliminate multiple dependencies in 
the claims, and place the Abstract in better U.S. form. 
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Fax 
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Amended Claims: Version to show changes made 

8. (Amended) A method as claimed in [any one of claims 1 to 7] claim 1 , 2 or 3 wherein 
said fragments are each between 8 and 25 bases in length. 

9. (Amended) A method as claimed in [any one of claims 1 to 8] claim 1 , 2 or 3 wherein 
n is at least 10. 

10. (Amended) A method of synthesizing a double stranded nucleic acid molecule 
comprising at least the steps of: 

1) generating fragment chains according to the method defined in [any one of claims 1 
to 9] claim 1, 2 or 3 : 

2) optionally generating single stranded regions at the end of said fragment chains, 
wherein said single stranded regions are complementary to the single stranded regions 
on said fragment chains thus forming complementary pairs of single stranded regions; 

3) contacting said fragment chains with one another, simultaneously or consecutively, 
to effect binding of said complementary pairs of single stranded regions. 

11. (Amended) A nucleic acid molecule produced according to a method as defined in 
[any one of claims 1 to 10] claim 1. 2 or 3 . or a single stranded nucleic acid molecule thereof. 

12. (Amended) A method of identifying the code elements contained in a nucleic acid 
molecule prepared according to a method as defined in [any one of claims 1 to 10] claim 1, 2 or 
3, wherein a probe, carrying a signalling means, specific to one or more code elements, is 
bound to said nucleic acid molecule and a signal generated by said signalling means is 
detected, whereby said one or more code elements may be identified. 

13. (Amended) A library of fragments as defined in [any one of claims 1 to 12] claim 1. 
2 or 3 . comprising (n) m fragments, wherein n is as defined in [any one of claims 1 to 12] claim 1 , 
2 or 3 and corresponds to the length of chain that said library may produce, and m is an integer 
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corresponding to the number of possible code elements or combinations thereof, such that 
fragments corresponding to all possible code elements for each position in the final chain are 
provided. 
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ABSTRACT 

The present invention provides a method of attaching a fragment of a first nucleic acid 
molecule to a second nucleic acid molecule using adapters to mediate the binding particularly in 
methods of cloning, methods of producing fragment chains with a readily readable information 
content, particularly comprising fragments corresponding to code, such as alphanumeric code, 
the nucleic acid molecules thus produced and kits for performing such methods. 
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PATENT AND 
TRADEMARK OFFICE 
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First Named 
Inventor 


LEXOW, Preben 


Group Art Unit 


Unassigned 


Examiner Name 


Unassigned 


Attorney Docket 
Number 


1181-256 


Title: METHODS OF CLONING AND PRODUCING FRAGMENT CH 
READABLE INFORMATION CONTENT 


AINS WITH 



SECOND PRELIMINARY AMENDMENT and 
RESPONSE TO NOTIFICATION OF MISSING REQUIREMENTS 
UNDER 35 U.S.C. §371 

Assistant Commissioner for Patents 
Box PCT 

Washington, DC 20231 
Dear Sir: 

In response to the Notification of Missing Requirements 
dated March 22, 2002 (copy enclosed) enclosed is the Declaration 
and Power of Attorney and a check for $785.00 to cover the $65.00 
"surcharge for late filing of the declaration and the $720.00 
four-month extension of time fee. Please charge any additional 
fees to deposit account number 02-2135 in the name of Rothwell, 
Figg, Ernst & Manbeck. 

Attached is the sequence listing in paper and computer 
readable form. 

Entry of the following amendments is respectfully requested. 
IN THE SPECIFICATION : 

Please enter the attached Sequence Listing submitted 
herewith . 

Amend the specification as shown on the following pages. 
09/25/2002 RKAYPAGH 00000039 10019256 
01 FC:254 65.00 OP 

09/25/2002 HKAYPAGH 00000039 10019258 

oe FC:2t8 mm w 
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Marked-up copies of the original text of the amended 
specification are attached to this amendment. Material inserted 
is indicated by underline ( underline ) and material deleted is 
indicated by angled brackets (<angled brackets>) . 

Clean copy of the amended specification (paragraph on page 14 at 
line 2 6-page 15 at line 11) 

To increase the number of permutations in an adapter 
library, two separate oligonucleotide libraries may be generated, 
one with single stranded oligonucleotides with regions that will 
correspond to the single stranded region of the first nucleic 
acid molecule fragment and the second library with single 
stranded oligonucleotides with regions that will correspond to 
the single stranded region of the second nucleic acid molecule 
(e.g. vector) . However in common in each member of the library 
is a complementary region, such that when one member from the 
first library is selected and combined with a member of the 
second library, they will hybridize leaving free the relevant 
single stranded regions . Thus for example to generate an adapter 
with an AA overhang and a TC overhang to bind to the first and 
second nucleic acid molecules respectively, members of the 
different libraries such as GGCCCCCNNAA [SEQ ID N0:l]may be 
combined with 3 ' -TCNNNCCGGGG-5 1 [SEQ ID NO:2] to form: 
GGCCCCCNNAA [SEQ ID NO : 1 ] 
TCNNNCCGGGG [SEQ ID NO : 2 ] 
which exhibits the appropriate overhangs. When using only two 16 
member libraries this allows the production of 256 different 
adapters . 
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Clean copy of the amended specification (paragraph on page 16 at 
line 20-page 17 at line 11) 

Over 100 classes of IIS restriction endonucleases have been 
identified and there are large variations both with respect to 
substrate specificity and cleaving pattern. In addition, these 
enzymes have proved to be well suited to "module swapping" 
experiments so that one can create new enzymes for particular 
requirements (Huang-B, et al.; J-Protein-Chem. 1996, 15(5): 481-9, 
Bickle, T.A.; 1993 in Nucleases (2nd edn) , Kim-YG et al.;PNAS 
1994, 91:883-887). In these experiments the binding domain of 
transcription factor Spl was merged with the cleavage domain of 
Fokl to construct a class IIS restriction endonuclease that makes 
a 4-base overhang with Spl sites. In other experiments a class 
IIS restriction endonuclease that cuts outside the binding sites 
of transcription factor Ultrabithorax was generated. 
Corresponding experiments have been conducted on class I enzymes. 
By merging the N-terminal part of the hsdS sub-unit of StyR 1241 
(which recognizes GAAN 6 RTCG [SEQ ID NO:82]) with the C-terminal 
part of the hsdS sub-unit of StyR 1241 (which recognizes 
TCAN 7 RTTC [SEQ ID NO:83]) a new enzyme that recognizes the 
sequence GAAN 6 RTTC [ SEQ ID NO: 84] was constructed. Several other 
experiments have been carried out with similar success. Unlike 
in the case of ordinary class II enzymes, it is therefore 
reasonable to assume that a number of new IIS and IP restriction 
enzymes can be constructed and adapted to cloning requirements 
that may arise in the future. Very many combinations and 
variants of these enzymes can therefore be used according to the 
principles described herein. 
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Clean copy of the amended specification (paragraph on page 44 at 
line 30-page 45 at line 25) 

The following examples are given by way of illustration only in 
which the Figures referred to are as follows: 

Figure 1 shows a schematic representation of how the method of 
the invention may be used to introduce an insert into a vector, 
in which the insert is cleaved from the first nucleic acid 
molecule, associated with adapters and ligated thereto and then 
ligated into the vector; 

Figure 2 shows the production of a fragment chain using 8 "O" and 
"1" starting fragments with different overhangs (aaaaaaaaaa [SEQ 
ID NO: 100], aaaaaaaaac [SEQ ID NO:54], aaaaaaaccg [ SEQ ID NO:57], 
ccccccccccgg [SEQ ID NO: 59], cccccccccgcg [ SEQ ID NO: 56], 
cccccccccttt [SEQ ID NO: 53], ggggggggaaa [ SEQ ID NO: 51], 
ggggggggaac [SEQ ID NO: 52], ggggggggccg [SEQ ID NO: 55], 
ttttttttcgg [SEQ ID NO: 60], ttttttttgcg [SEQ ID NO:58], 
ttttttttttt [SEQ ID NO:101]); 

Figure 3 shows the production of a 64 fragment chain in which 8 
chains are produced comprising 8 fragments each, in which the 
termini of chains 1 and 2, and 2 and 3 etc. are complementary 
such that they may be ligated together ( aaaaaaaaaa [ SEQ ID 
NO: 100], aaaaaaaaaaaaa [SEQ ID NO: 102], aaaggggggggaaa [SEQ ID 
NO: 61], aacaaaaaaaaaa [SEQ ID NO: 62], aacggggggggaaa [ SEQ ID 
NO: 103], cttccccccccccg [SEQ ID NO: 104], cttttttttttcg [ SEQ ID 
NO: 105], ggggggggaaa [SEQ ID NO:51], gttccccccccccg [ SEQ ID NO: 65], 
gttttttttttcg [SEQ ID NO: 66], tttccccccccccg [ SEQ ID NO: 63], 
tttttttttttcg [SEQ ID NO:64]); 

Figure 4 shows 3 techniques for mixing "O", "1" fragments from a 
library of fragments ordered for each position, in which in A) 
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appropriate fragments are selected by aspiration from appropriate 
wells, B) appropriate fragments are released from the library is 
wells and C) a flow cytometer is used to direct appropriate 
droplets to the mixing chamber; 

Figure 5 shows PCR amplification of signal chain 1-0-1-0-0 using 
SP6 and T7 primers. Lane 1: 1 ug of 1 kb DNA ladder (Gibco BRL) , 
Lane 2: 10 ul of PCR amplified fragment chain DNA using SP6 and 
T7 primers. Lane 3: Same as lane 2 except for the use of SP6 and 
T7-Cy5 primers; and 

Figure 6 shows the use of primer pairs during the process of 
amplification to join together fragment chains. 

Clean copy of the amended specification (paragraph on page 48 at 
lines 21-34) 

Materials : 

Oligonucleotides used to address PhiX174 overhangs: 
Bbvl overhang la: 

5'- CGA GCG CCT CCA GTG CAG CGG AG [ SEQ ID NO : 3 ] 
Bbvl overhang 5a: 

5 1 - TATC GCG CCT CCA GTG CAG CGG AG [SEQ ID NO : 4 ] 
Bbvl overhang 6b: 

5'- CTCT GCG CCT CCA GTG CAG CGG AG [SEQ ID NO : 5 ] 
Bbvl overhang 6 (delC) : 

5'- CTCT CTC CGC TGC ACT GGA GGC GC[SEQ ID NO: 6] , 
Bbvl overhang 7a: 

5'- CAAC GCG CCT CCA GTG CAG CGG AG [SEQ ID NO : 7 ] 
Bbvl overhang 9b: 

5'- GGTA GCG CCT CCA GTG CAG CGG AG [SEQ ID NO : 8 ] 
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Clean copy of the amended specification (paragraph on page 49 at 
lines 1-5) 

Oligonucleotides used to address pUC19 overhangs: 
Cloning site la 

5'- AAGAG CTC CGC TGC ACT GGA GGC GC[SEQ ID NO : 9 ] 
Cloning site lb 

5 f - CTCTT CTC CGC TGC ACT GGA GGC GC[SEQ ID NO: 10] 

Clean copy of the amended specification (paragraph on page 53 at 
line 11-page 54 at line 6) 

In this Example, the location of the binding motifs of the 
initiation linkers is shown below: 



Fokl GGATG 

Bstlll — GCAGC 

Hgal ^ ■ GACGC 

Bp II GAG CTC 

Bael CYATG CA 

Cjel CCA GT 

HaelV GAY RTC 

Consensus — GCAGCGACCATGAGTCCA-CTC — GTGGATGACGC [ SEQ ID NO: 11] 



Initiation linkers : 

X=0: 5' — GCAGCGACCATGAGTCCA-CTC — GTGGATGPPPPPP [ SEQ ID NO: 12] 
3 ? — CGTCGCTGGTACTCAGGT-GAG--CACCTAC [SEQ ID NO: 69] 

X=l: 5' — GCAGCGACCATGAGTCCA-CTC — GTGGATG-PPPPPP [SEQ ID NO: 13] 
3' — CGTCGCTGGTACTCAGGT-GAG — CACCTAC-[SEQ ID NO: 70] 

X=2: 5 ? — GCAGCGACCATGAGTCCA-CTC --GTGGATG — PPPPPP [SEQ ID NO: 14] 



Serial No. 10/019,258 
September 23, 2002 
Page 7 



3 ' 


— CGTCGCTGGTACTCAGGT- 


-GAG- 


-CACCTAC — [SEQ ID NO : 7 1 ] 


X=3: 5' 


— GCAGCGACCATGAGTCCA- 


-CTC- 


-GTGGATG PPPPPP[SEQ ID NO: 15] 


3' 


— CGTCGCTGGTACTCAGGT - 


-GAG- 


-CACCTAC [SEQ ID NO: 72] 


X=4 : 5 ' 


— GCAGCGACCATGAGTCCA- 


-CTC- 


-GTGGATGACGCPPPPPP [SEQ ID NO: 16] 


3 ' 


— CGTCGCTGGTACTCAGGT - 


-GAG- 


-CACCTACTGCG [SEQ ID NO : 7 3 ] 


X=5: 5' 


— GCAGCGACCATGAGTCCA- 


-CTC- 


-GTGGATGACGC-PPPPPP [SEQ ID 


NO: 17] 






- 


3' 


— CGTCGCTGGTACTCAGGT - 


-GAG- 


-CACCTACTGCG- [SEQ ID NO: 74] 


X=6: 5' 


— GCAGCGACCATGAGTCCA- 


-CTC- 


-GTGGATGACGC — PPPPPP[SEQ ID 


NO: 18] 








3 ' 


— CGTCGCTGGTACTCAGGT - 


-GAG- 


-CACCTACTGCG — [SEQ ID NO: 75] 


X=7 : 5 ' 


— GCAGCGACCATGAGTCCA- 


-CTC- 


-GTGGATGACGC PPPPPP[SEQ ID 


NO: 19] 








3' 


— CGTCGCTGGTACTCAGGT - 


-GAG- 


-CACCTACTGCG [SEQ ID NO: 76] 


X=8: 5' 


— GCAGCGACCATGAGTCCA- 


-CTC- 


-GTGGATGACGC PPPPPP[SEQ ID 


NO:20] 








3' 


— CGTCGCTGGTACTCAGGT - 


-GAG- 


-CACCTACTGCG [SEQ ID NO-: 77] 


X=9: 5' 


— GCAGCGACCATGAGTCCA- 


-CTC- 


-GTGGATGACGC PPPPPP[SEQ ID 


NO:21] 








3' 


— CGTCGCTGGTACTCAGGT - 


-GAG- 


-CACCTACTGCG-- [SEQ ID NO: 78] 



Clean copy of the amended specification (paragraph on page 54 at 
lines 21-35) 

Propagation linkers : 



Fokl: 5 ' GGATG 

3 i CCTACNNNN 

Bst71I: 5 ? GCAGC 

3 i CGTCGNNNN 

Hgal: 5' GACGC 

3 ' CTGCGNNNNN [SEQ ID NO: 79] 
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SplT: 5 f GAG CTCNNNNN 

3 1 CTC GAG 

Bael : 5 1 CCATG CANNNNN 

3 . ; GGTAC GT 

HaelV: 5 ' GAC GTCNNNNNN 

3 1 CTG CTG 

C j el : 5 ' CCA -GTNNNNNN 

3 » . GGT CA 

Clean copy of the amended specification (paragraph on page 55 at 
lines 28-36) 

The 3 1 -GAGTGC overhang is then ligated with the X=3 initiation 
linker and the GTGAA-3 1 overhang is ligated with .the CACTT-3 ' 
overhang on the target DNA molecule: 

5' — GCAGCGACCATGAGTCCA-CTC — GTGGATG PPPPPP[SEQ ID NO: 15] ■ 

3' — CGTCGCTGGTACTCAGGT-GAG — CACCTAC QQQQQQ [SEQ ID NO: 85] 

__ GTGAA 3 1 

CACTT 5' 

Clean copy of the amended specification (paragraphs on page 56 at 
line 15-page 58 at line 7) 

Method 1 

Two IIS enzymes that generate 5 ! -4 base overhangs (Bbsl and 
Esp3I) : 

5' . . VVVVVVVVGAGC-GAGACG GAAGAC — GAGCIIIIIIIIII 3 1 [SEQ ID 

NO: 86] 
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3' VVVVVVVVCTCG-CTCTGC CTTCTG--CTCGI I I I I I I I I I . . 5 ' [SEQ ID 

NO: 87] 

After cleavage with Bjbsl and Esp3I: 

..VVVVVVVV 4- GAGC-GAGACG GAAGAC-- [SEQ ID NO: 88] + 

VVVVVVVVCTCG -CTCTGC CTTCTG — CTCG[SEQ ID NO: 89] 

GAGCIIIIIIIIII 

IIIIIIIIII . . 

After ligation with T4 DNA ligase: 

GAGC-GAGACG GAAGAC — [SEQ ID NO: 88] + 

-CTCTGC CTTCTG — CTCG[SEQ ID NO : 8 9 ] 

. . VVVVVVVVGAGCIIIIIIIIII [SEQ ID NO: 90] 
VWWWVCTCGIIIIIIIIII. . [SEQ ID NO:91] 

Method 2 

One IIS enzyme that generates two 3 1 3 base overhangs {BsaXI ) : 

5' . .VVVVVVVVGAG- AC CTCC GAGI 1 1 1 1 1 1 1 1 1 3' [SEQ 

ID NO: 92] 

3 ' VVVVVVVVCTC TG GAGG -CTCI 1 1 1 1 1 1 1 1 1 . . 5 ■ [SEQ 

ID NO:93] 

After cleavage with BsaXI: 



. VVVVVVVVGAG + 
VVVVVVVV CTC 



GAG [ SEQ ID NO: 94] 
[SEQ ID NO: 95] 
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+ IIIIIIIIII 

CTCIIIIIIIIII . . 

After ligation with T4 DNA ligase: 

AC CTCC GAG [ SEQ ID NO: 94] + 

CTC ■ TG GAGG [SEQ ID NO: 95] 

. . VVVVVVVVGAG I I IIIIIIII 
VVVVVVVVCTC I I IIIIIIII. . 

Method 3 

One IIS enzyme that generates blunt ends (Mlyl) : 

5' . .VVVVVVVV GAGTC IIIIIIIIII 3 f [SEQ ID 

NO: 96] 

3' VVVVVVVV CTGAG 1 1 1 1 1 1 1 1 1 1 .. 5 1 [ SEQ ID 

NO: 96] 

After cleavage with Mlyl: 

..VVVVVVVV + GAGTC --[SEQ ID NO:97] + 

VVVVVVVV CTGAG [SEQ ID NO: 97] 

IIIIIIIIII 
IIIIIIIIII . . 

After ligation with T4 DNA ligase: 



GAGTC [SEQ ID NO: 97] 

CTGAG [SEQ ID NO: 97] 



+ 
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. . VVVVVVVVIIIIIIIIII 
VVVVVVVVI I IIIIIIII.. 

Clean copy of the amended specification (paragraph on page 71 at 
line 14-page 72 at line 4) 

Based upon the overhang pairs, a set of five library components 
was made by annealing complementary oligonucleotides in separate 
tubes : 
signal 1: 

5 1 -TAATACGACTCACTATACCACAAGTTTGTACAAAAAAGCAGGCTCTATTC-3 ' [SEQ ID 

NO: 22] 

and 

5 1 -TAGGAAGAATAGAGCCTGCTTTTTTGTACAAACTTGTGGTATAGTGAGTCGTATTA-3 1 
[SEQ ID NO: 23] ; 
signal 2 : 

5 1 -TTCCTATGCAGTGGACCACTTTGTACAAGAAAGCTGGGTTGCAGT-3 1 [SEQ ID NO:24] 
and 5 1 -GCAACTACTGCAACCCAGCTTTCTTGTACAAAGTGGTCCACTGCA-3 1 [SEQ ID 
NO: 25] ; 
signal 3: 

5 1 -AGTTGCTTGACGCCACAAGTTTGTACAAAAAAGCAGGCTTTGACG-3 1 [SEQ ID NO : 2 6 ] 
and 5 1 -CGACATCGTCAAAGCCTGCTTTTTTGTACAAACTTGTGGCGTC7VA-3 1 [SEQ ID 
NO: 27] ; 
signal 4 : 

5 1 -ATGTCGAAGGGCGGACCACTTTGTACAAGAAAGCTGGGTAAGGGC-3 1 [SEQ ID NO: 28] 
and 5 f -GACAGGGCCCTTACCCAGCTTTCTTGTACAAAGTGGTCCGCCCTT-3 ' [SEQ ID 
NO : 2 9 ] ; 
signal 5: 

5 1 -CCTGTCATGTGGACCACTTTGTACAAGAAAGCTGGGTTTCTATAGTGTCACCTAAATC-3 1 
[SEQ ID NO: 30] and 
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5 ' -GATTTAGGTGACACTATAGAAACCCAGCTTTCTTGTACAAAGTGGTCCACAT-3 ' [SEQ ID 
NO: 31] ; 

T7: 5 1 -TAATACGACTCACTATACCA-3 ' [SEQ ID NO:32]; 

T7-CyS primer: 5 1 -TAATACGACTCACTATA-3 ' [SEQ ID NO:33]; and 

SP6 primer: 3 1 -AAGATATCACAGTGGATTTAG-5 ' [ SEQ ID NO:34]. 

The library components (4 pmol each) were then mixed together and 

ligated using 100 U T4 DNA ligase (NEB) in IX ligase buffer at 25 

°C for 15 minutes. The. ligase was then inactivated at 65 °C for 

20 min. 

Clean copy of the amended specification (paragraph on page 73 at 
lines 10-26) 

Materials : 

Oligonucleotides are selected which bind to the fragment chain 
and also serve as primers. Thus for example, for adjacent chains 
may be bound using for example the following primer pairs: 

fragment chain 2 terminal (with bound primer) : 
5 1 TTCTATAGTGTCACCTAAATC3 1 [SEQ ID NO:35] 

3 ' AAGATATCACAGTGGATTTAGCCTACCAGTACATCCAACGGCAACT5 1 [SEQ ID NO: 36] 
fragment chain 3 terminal [with bound primer) : 

5 ' GTCATGTAGGTTGCCGTTGATCCATCCTAATACGACTCACTATAGCA3 ' [SEQ ID NO: 37] 

3 1 ATTATGCTGAGTGATATCGT5 ' [SEQ ID NO: 38] 



The above exemplified primer regions are complementary and may 
thus be bound together. 
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Clean copy of the amended specification (paragraph on page 75 at 
lines 12-18) 

Gene A has the following sequence at its first and last five 
bases (marked by underlining) . 

5' . . . GCTGGAGGCCTCCACTATGAAATCGCGTAGAG . . . [SEQ ID NO : 8 0 ] 

3' . . . CGACCTCCGGAGGTGATACTTTAGCGCATC [SEQ ID NO: 98] 

CTGGCGGAA AATGA GAAAATTCGACCTA. . .3' [SEQ ID NO:81] 

. . . ACGACCGCCTTTTACTCTTTTAAGCTGG 5 f [SEQ ID NO:99] 

Clean copy of the amended specification (paragraph on page 76 at 
line 1-page 77 at line 2) 

Materials : 

Initiation linker 1 (s) : 

5 ? ATT CGG TCG AGA TGC TCT CA3 1 [SEQ ID NO: 39] 
Initiator linker 1 (as) : 

5 1 CGA CTG AGA GCA TCT CGA CCG AAT3 1 [SEQ ID NO : 4 0 ] 
Initiation linker 2 (s) : 

5'GCG TTA CTG AGC GTA GCT CTG3 1 [SEQ ID NO: 41] 
Initiator linker 2 (as) : 

5'CTC TCA GAG CTA CGC TCA GTA ACG C3 ' [SEQ ID NO: 42] 
Propagation linker (s) : 

5 ! TGC TGC AGG AGC GAA TCT CNN NNN3 1 [SEQ ID NO : 4 3 ] 
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Propagation linker (as) : 

5 1 GAG ATT CGC TCC TGC AGC A3 1 [SEQ ID NO : 4 4 ] 
Labeling linker 2 (s) : 

5'CTC TTG CTA TAG TGA GTC GTA TTA3 1 [SEQ ID NO: 45] 
Labeling linker 2 (as) : 

5 1 TAA TAC GAC TCA CTA TAG CA3 ' [SEQ ID NO: 4 6] 
Termination linker 1 (s): 

5 1 AAG AGC TCA GGT CAT TGA CGT AGC TAT GAA3 1 [SEQ ID NO : 4 7 ] 
Termination linker 1/2 (as) : 

5 1 AGC TAC GTC AAT GAC CTG AG3 1 [SEQ ID NO: 48] 

Termination linker I (short version) : 
5 1 AAG AGA TGA A3 T [SEQ ID NO : 4 9 ] 

Termination linker 2 (s) : 

5'ACC GCT CAG GTC ATT GAC GTA GCT TCA TT3 ' [SEQ ID NO: 50] 

Termination linker 2 (short version) : 
5'ACC GTC ATT3 1 

REMARKS 

In response to a Notification of Missing Requirements under 
35 U.S.C. §371 dated March 22, 2002 (a response copy is 
attached) , an initial Sequence Listing is submitted, and its 
entry into the application is respectfully requested. Pursuant 
to 37 CFR § 1.821(e), an initial computer-readable form of the 
Sequence Listing is also submitted, and it is hereby certified 
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that the contents of the paper and computer-readable copies of 
the Sequence Listing are identical and contain no new matter. 

The specification has been amended to properly include the 
sequence identifiers, and correct obvious typographical errors. 
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Marked-up copy of the amended specification (paragraph on page 14 
at line 26-page 15 at line 11) 

To increase the number of permutations in an adapter 
library, two separate oligonucleotide libraries may be generated, 
one with single stranded oligonucleotides with regions that will 
correspond to the single stranded region of the first nucleic 
acid molecule fragment and the second library with single 
stranded oligonucleotides with regions that will correspond to 
the single stranded region of the second nucleic acid molecule 
(e.g. vector). However in common in each member of the library 
is a complementary region, such that when one member from the 
first library is selected and combined with a member of the 
second library, they will hybridize leaving free the relevant 
single stranded regions. Thus for example to generate an adapter 
with an AA overhang and a TC overhang to bind to the first and ' 
second nucleic acid molecules respectively, members of the 
different libraries such as GG<G>CCCCCNNAA \ SEQ ID NO: 11 m ay be 
combined with 3 ' - TCNNNCCGGGG -5 1 I SEP ID NO : 2 1 to form: 
GGCCCCCNNAA< , > TSEO ID NO : 11 
TCNNNCCGGGG rSEQ ID NO : 2 1 
which exhibits the appropriate overhangs. When using only two 16 
member libraries this allows the production of 256 different 
adapters. 

Marked-up copy of the amended specification (paragraph on page 16 
at line 20-page 17 at line 11) 

Over 100 classes of IIS restriction endonucleases have been 
identified and there are large variations both with respect to 
substrate specificity and cleaving pattern. In addition, these 
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enzymes have proved to be well suited to "module swapping" 
experiments so that one can create new enzymes for particular 
requirements (Huang-B, et al . ; J-Protein-Chem. 1996, 15(5): 481-9, 
Bickle, T.A.; 1993 in Nucleases (2nd edn) , Kim-YG et al.;PNAS 
1994, 91:883-887). In these experiments the binding domain of 
transcription factor Spl was merged with the cleavage domain of 
Fokl to construct a class IIS restriction endonuclease that makes 
a 4-base overhang with Spl sites. In other experiments a class 
IIS restriction endonuclease that cuts outside the binding sites 
of transcription factor Ultrabithorax was generated. 
Corresponding experiments have been conducted on class I enzymes. 
By merging the N-terminal part of the hsdS sub-unit of StyR 1241 
(which recognizes GAAN^RTCG \ SEP ID NO: 821 ) with the C-terminal 
part of the hsdS sub-unit of StyR 1241 (which recognizes 
TCAN 7 RTTC \ SEP ID NO: 831 ) a new enzyme that recognizes the 
sequence GAAN^RTTC [ SEP ID NO: 841 was constructed. Several other 
experiments have been carried out with similar success. Unlike 
in the case of ordinary class II enzymes, it is therefore 
reasonable to assume that a number of new IIS and IP restriction 
enzymes can be constructed and adapted to cloning requirements 
that may arise in the future. Very many combinations and 
variants of these enzymes can therefore be used according to the 
principles described herein. 

Marked-up copy of the amended specification (paragraph on page 44 
at line 30-page 45 at line 25) 



The following examples are 
which the Figures referred 



given by way of illustration 
to are as follows: 



only in 
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Figure 1 shows a schematic representation of how the method of 
the invention may be used to introduce an insert into a vector, 
in which the insert is cleaved from the first nucleic acid 
molecule, associated with adapters and ligated thereto and then 
ligated into the vector; 

Figure 2 shows the production of a fragment chain using 8 "0" and 
"1" starting fragments with different overhangs (aaaaaaaaaa [ SEP 
ID NO:1001, aaaaaaaaac rSEQ ID NO:541 / aaaaaaaccg [ SEP ID NO:571, 
ccccccccccgg [ SEP ID NO: 591, cccccccccacq TSEP ID NO:56], 
ccccccccctttrSEQ ID NO: 531, ggggggggaaa f SEP ID NO: 51], 
ggggggggaac rSEQ ID NO: 521, ggggggggccg [ SEP ID NO: 55], 
ttttttttcgg rSEQ ID NO:601, ttttttttacg [ SEP ID NO:581, 
ttttttttttt rSEP ID NO: 1011) ; 

Figure 3 shows the production of a 64 fragment chain in which 8 
chains are produced comprising 8 fragments each, in which the 
termini of chains 1 and 2, and 2 and 3 etc. are complementary 
such that they may be ligated together (aaaaaaaaaa [SEP ID 
NO: 1001, aaaaaaaaaaaaa [SEP ID NO: 1021, aaaggggggggaaa [ SEP ID 
NO: 611, aacaaaaaaaaaa r SEP ID NO: 621, aacggggggggaaa [ SEP ID 
NO:1031, cttccccccccccg [SEP ID NO:1041, cttttttttttcg [ SEP ID 
NO: 1051, ggggggggaaa [ SEP ID NO: 511, qttccccccccccg [SEP ID NO: 651, 
gtttttttttt eg T SEP ID NO: 661, tttccccccccccg [ SEP ID NP:631, 
tttttttttttcg rSEP ID NO:641) ; 

Figure 4 shows 3 techniques for mixing "O", "1" fragments from a 
library of fragments ordered for each position, in which in A) 
appropriate fragments are selected by aspiration from appropriate 
wells, B) appropriate fragments are released from the library is 
wells and C) a flow cytometer is used to direct appropriate 
droplets to the mixing chamber; 

Figure 5 shows PCR amplification of signal chain 1-0-1-0-0 using 
SP6 and T7 primers. Lane 1: 1 yg of 1 kb DNA ladder (Gibco BRL) , 
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Lane 2: 10 pi of PCR amplified fragment chain DNA using SP6 and 
T7 primers. Lane 3: Same as lane 2 except for the use of SP6 and 
T7-Cy5 primers; and 

Figure 6 shows the use of primer pairs during the process of 
amplification to join together fragment chains. 

Marked-up copy of the amended specification (paragraph on page 48 
at lines 21-34) 

Materials : 

Oligonucleotides used to address PhiXllA overhangs: 

Bbvl overhang la: 

5 1 - CGA GCG CCT CCA GTG CAG CGG AG \ SEP ID NO : 3 1 

Bbvl overhang 5a: 

5 1 - TATC GCG CCT CCA GTG CAG CGG AG [SEO ID NO : 4 1 

Bbvl overhang 6b: 

5 1 - CTCT GCG CCT CCA GTG CAG CGG AG [ SEQ ID NO : 5 1 

Bbvl overhang 6 (delC) : 

5 ' - CTCT CTC CGC TGC ACT GGA GGC GC TSEO ID NO: 61 

Bbvl overhang 7a: 

5 ' - CAAC GCG CCT CCA GTG CAG CGG AG \ SEO ID NO : 7 1 

Bbvl overhang 9b: 

5 ' - GGTA GCG CCT CCA GTG CAG CGG AG \ SEO ID NO : 8 1 

Marked-up copy of the amended specification (paragraph on page 49 
at lines 1-5) 

Oligonucleotides used to address pUC19 overhangs: 
Cloning site la 

5 1 - AAGAG CTC CGC TGC .. ACT GGA GGC GC [ SEQ ID NO : 9 1 
Cloning site lb 
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5 ' - CTCTT CTC CGC TGC ACT GGA GGC GC [SEP ID NO: 101 

Marked-up copy of the amended specification (paragraph on page 53 
at line 11-page 54 at line 6) 

In this Example, the location of the binding motifs of the 
initiation linkers is shown below: 



Fokl GGATG 

Bstlll — GCAGC 

Hgal GACGC 

Bp II GAG CTC 

Bael CYATG- CA 

Cjel CCA GT 

HaelV GAY RTC 

Consensus — GCAGCGACCATGAGTCCA-CTC--GTGGATGACGC [ SEP ID NO: 111 
Initiation linkers : 

X=0 : 5 1 — GCAGCGACCATGAGTCCA-CTC — GTGGATGPPPPPP [ SEP ID NO: 121 

3 1 — CGTCGCTGGTACTCAGGT-GAG — CACCTAC \ SEP ID NP:691 

X=l : 5 1 — GCAGCGACCATGAGTCCA-CTC — GTGGATG-PPPPPP [ SEP ID NO: 131 

3 1 — CGTCGCTGGTACTCAGGT-GAG — CACCTAC- [ SEP ID NO: 701 

X=2 : 5 ' — GCAGCGACCATGAGTCCA-CTC — GTGGATG — PPPPPP TSEO ID NO:141 

3 1 --CGTCGCTGGTACTCAGGT-GAG — CACCTAC-- [ SEP ID NO : 7 1 1 

X=3 : 5 1 — GCAGCGACCATGAGTCCA-CTC — GTGGATG PPPPPP fSEQ ID NO:151 

3 1 — CGTCGCTGGTACTCAGGT-GAG — CACCTAC \ SEP ID NO: 721 

X=4 : 5 ' — GCAGCGACCATGAGTCCA-CTC — GTGGATGACGCPPPPPP [ SEP ID NO: 161 

3 1 — CGTCGCTGGTACTCAGGT-GAG — CACCTACTGCG \ SEP ID NO : 7 3 1 

X=5 : 5 1 — GCAGCGACCATGAGTCCA-CTC — GTGGATGACGC- PPPPPP [ SEP ID 
NP: 171 

3 ? — CGTCGCTGGTACTCAGGT-GAG — CACCTACTGCG- f SEP ID NO: 741 
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X=6 : 5 | — GCAGCGACCATGAGTCCA-CTC 
NO: 181 

3 1 — CGTCGCTGGTACTCAGGT-GAG 

X=7 : 5 1 — GCAGCGACCATGAGTCCA-CTC 
NO: 191 

3 1 — CGTCGCTGGTACTCAGGT-GAG- 

X=8: 5' — GCAGCGACCATGAGTCCA-CTC- 
NO: 201 

3 1 — CGTCGCTGGTACTCAGGT-GAG- 

X=9 : 5 ' — GCAGCGACCATGAGTCCA-CTC- 
NO:211 

3 1 — CGTCGCTGGTACTCAGGT-GAG- 



■-GTGGATGACGC — PPPPPP TSEP ID 

-CACCTACTGCG — [SEP ID NO : 7 5 1 
-GTGGATGACGC PPPPPP TSEP ID 

-CACCTACTGCG f SEP ID NO: 761 

-GTGGATGACGC PPPPPP TSEP ID 

-CACCTACTGCG TSEP ID NO: 771 

-GTGGATGACGC PPPPPP fSEP ID 

-CACCTACTGCG [SEP ID NO : 7 8 1 



Marked-up copy of the amended specification (paragraph on page 54 
at lines 21-35) 

Propagation linkers : 

Fokl: 5' GGATG 

3 . CCTACNNNN 

Bst71I: 5' GCAGC 

3 . CGTCGNNNN 

Hgal: 5' — GACGC 

3 » CTGCGNNNNN I SEP ID NP : 7 9 1 

SplI : 5 ' GAG-' CTCNNNNN 

3 . CTC GAG 

Bael : 5 ' CCATG — --CANNNNN 

3 ' GGTAC GT 

HaelV : 5 * GAC GTCNNNNNN 

3 ' CTG --CTG 

C j el : 5 1 CCA GTNNNNNN 

3 > GGT CA 
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Marked-up copy of the amended specification (paragraph on page 55 
at lines 28-36) 

The 3 ' -GAGTGC overhang is then ligated with the X=3 initiation 
linker and the GTGAA-3 1 overhang is ligated with the CACTT-3 ? 
overhang on the target DNA molecule: 

5 ' — GCAGCGACCATGAGTCCA-CTC — GTGGATG PPPPPP TSEO ID NO:151 

3 1 — CGTCGCTGGTACTCAGGT-GAG — CACCTAC QQQQQQ T SEP ID NO: 851 

GTGAA 3_L 

CACTT 5 ' 

Marked-up copy of the amended specification (paragraphs on page 
56 at line 15-page 58 at line 7) 

Method 1 

Two IIS enzymes that generate 5 '-4 base overhangs (Bbsl and 
£sp3l) : , 

5 1 . . VVVVVVVVGAGC-GAGACG GAAGAC — GAGCI I I I I I II I I 3 1 fSEO ID 

NO: 8 61 

31 VVVVVVVVCTCG-CTCTGC — CTTCTG — CTCGI I I I I II I I I . . 5 1 [SEP ID 

NO: 871 

After cleavage with Ejbsl and Esp3I: 

..VVVVVVVV + GAGC-GAGACG- GAAGAC — \ SEP ID NO:881 + 

VVVVVVVVCTCG -CTCTGC CTTCTG — CTCG fSEp ID NP:891 
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IIIIIIIIII . . 
After ligation with T4 DNA ligase: 

GAG C - GAG AC G — G AAG AC — \ SEP ID NO: 881 + 

- CTCTGC CTTCTG — CTCG \ SEP ID NO: 891 

. . VVVVVVVVGAGC IIIIIIIII I FSEO ID NO: 901 
VVVVVVVVCTCGIIIIIIIIII . . TSEQ ID NO: 911 

Method 2 

One IIS enzyme that generates two 3 1 3 base overhangs (BsaXI) : 

5 ' . . VVVVVVVVGAG AC CTCC GAGI 1 1 1 1 1 1 1 1 1 3 ' TSEO 

ID NO: 921 

3 ' VVVVVVVVCTC TG-- GAGG CTCI 1 1 1 1 1 1 1 1 1 . . 5 1 TSEO 

ID NO: 931 

After cleavage with BsaXI : 

. . VVVVVVVVGAG + AC CTCC GAG \ SEP ID NO: 941 

VVVVVVVV CTC TG GAGG [ SEQ ID NO: 951 

+ IIIIIIIIII 

CTCIIIIIIIIII . . 

After ligation with T4 DNA ligase: 



CTC 



GAG T SEP ID NO: 941 



rSEQ ID NO: 951 



+ 
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. . VVVVVVVVGAGI IIIIIIIII 
VVVVVVVVCTCIIIIIIIIII. . 

Method 3 

One IIS enzyme that generates blunt ends (Mlyl) : 

5 ' . . VVVVVVVV GAGTC IIIIIIIIII 3 1 TSEO ID 

NO: 961 

31 VVVVVVVV CTGAG IIIIIIIIII. . 5 1 [ SEQ ID 

NO: 961 

After cleavage with Mlyl: 

..VVVVVVVV + GAGTC TSEO ID NO: 971 + 

VVVVVVVV CTGAG [SEP ID NO: 971 

IIIIIIIIII 
IIIIIIIIII . . 

After ligation with T4 DNA ligase: 

GAGTC rSEO ID NO: 971 + 

CTGAG [SEQ ID NO: 971 

. . VVVVVVVVI IIIIIIIII 
VVVVVVVVIIIIIIIIII . . 

Marked-up copy of the amended specification (paragraph on page 71 
at line 14-page 72 at line 4) 
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Based upon the overhang pairs, a set of five library components 
was made by annealing complementary oligonucleotides in separate 
tubes : 
signal 1: 

5 ' -TAATACGACTCACTATACCACAAGTTTGTACAAAAAAGCAGGCTCTATTC-3 ' TSEO ID 

NO:221 

and 

5 1 -TAGGAAGAATAGAGCCTGCTTTTTTGTACAAACTTGTGGTATAGTGAGTCGTATTA-3 ' TSE 
Q ID NO: 231 ; 
signal 2: 

5 ' -TTCCTATGCAGTGGACCACTTTGTACAAGAAAGCTGGGTTGCAGT-3 ' rSEQ ID NO:241 
and 5 ' -GCAACTACTGCAACCCAGCTTTCTTGTACAAAGTGGTCCACTGCA-3 ' TSEO ID 
NO:251 ; 
signal 3: 

5 ' -AGTTGCTTGACGCCACAAGTTTGTACAAAAAAGCAGGCTTTGACG-3 ' TSEO ID NO : 2 6 1 
and 5 ' -CGACATCGTCAAAGCCTGCTTTTTTGTACAAACTTGTGGCGTCAA-3 * [SEP ID 
NO:271 ; 
signal 4 : 

5 1 -ATGTCGAAGGGCGGACCACTTTGTACAAGAAAGCTGGGTAAGGGC-3 ' TSEQ ID NO : 2 8 1 
and 5 1 -GACAGGGCCCTTACCCAGCTTTCTTGTACAAAGTGGTCCGCCCTT-3 * TSEO ID 
NO:291 ; 
signal 5: 

5 1 -CCTGTCATGTGGACCACTTTGTACAAGAAAGCTGGGTTTCTATAGTGTCACCTAAATC-3 ' J_ 
SEP ID NO:301 and 

5 1 -GATTTAGGTGACACTATAGAAACCCAGCTTTCTTGTACAAAGTGGTCCACAT- 3 1 TSEP ID 
NO: 311 ; 

T7 : 5 1 -TAATACGACTCACTATACCA-3' \ SEO ID NO:321; 

T7-CyS primer: 5 1 -TAATACGACTCACTATA-3 1 \ SEP ID NO:331; and 

SP6 primer: 3 1 -AAGATATCACAGTGGATTTAG-5 1 TSEP ID NO:341. 

The library components (4 pmol each) were then mixed together and 

ligated using 100 U T4 DNA ligase (NEB) in IX ligase buffer at 25 
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°C for 15 minutes. The ligase was then inactivated at 65 °C for 
2 0 min. 

Marked-up copy of the amended specification (paragraph on page 73 
at lines 10-26) 

Materials : 

Oligonucleotides are selected which bind to the fragment chain 
and also serve as primers. Thus for example, for adjacent chains 
may be bound using for example the following primer pairs: 

fragment chain 2 terminal {with bound primer) : 
5 1 TTCTATAGTGTCACCTAAATC 3 ' TSEQ ID NO: 351 

3 T AAGATATCACAGTGGATTTAGCCTACCAGTACATCCAACGGCAACT 5 ' TSEQ ID NO: 361 
fragment chain 3 terminal (with bound primer) : 

5 f GTCATGTAGGTTGCCGTTGATCCATCCTAATACGACTCACTATAGCA 3 1 TSEO ID NO: 371 

3 1 ATTATGCTGAGTGATATCGT 5 1 rSEQ ID NO:381 

The above exemplified primer regions are complementary and may 
thus be bound together. 

Marked-up copy of the amended specification (paragraph on page 75 
at lines 12-18) 

Gene A has the following sequence at its first and last five 
bases (marked by underlining) . 



5 1 . . . GCTGGAGGCCTCCACT ATGAAA TCGCGTAGAG . . . [ SEP ID NO: 801 
3 1 . . . CGACCTCCGGAGGTGATACTTTAGCGCATC fSEO ID NO: 981 



;i,ci ci. '9 e Sis , o m a 3 cms 

4 



Serial No. 10/019,258 
September 23, 2002 
Page 27 

CTGGCGGAA AATGA GAAAATTCGACCTA . . . 3 j [ SEP ID NO : 8 11 

. . . ACGACCGCCTTTTACTCTTTTAAGCTGG 5 1 \ SEP ID NO: 991 

Marked-up copy of the amended specification (paragraph on page 76 
at line 1-page 77 at line 2) 

Materials : 

Initiation linker 1 (s): 

5 f ATT CGG TCG AGA TGC TCT CA3 T TSEO ID NO: 391 
Initiator linker 1 (as) : 

5'CGA CTG AGA GCA TCT CGA CCG AAT3 ' [SEP ID NO: 401 
Initiation linker 2 (s): 

5'GCG TTA CTG AGC GTA GCT CTG3 ' TSEP ID NP: 411 
Initi<t>ator linker 2 (as) : 

5'CTC TCA GAG CTA CGC TCA GTA ACG C3 ' TSEP ID NP : 421 
Propagation linker (s) : 

5 1 TGC TGC AGG AGC GAA TCT CNN NNN3 T \ SEP ID NP : 4 3 1 
Propagation linker (as): 

5 1 GAG ATT CGC TCC TGC AGC A3 1 \ SEP ID NP : 441 
Labeling linker 2 (s)_l 

5 f CTC TTG CTA TAG TGA GTC GTA TTA3 1 f SEP ID NO : 4 5 1 



Labeling linker 2 (as) : 

5 1 TAA TAC GAC TCA CTA TAG CA3 1 [SEP ID NP : 4 6 1 
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Termination linker 1 (s) : 

5'AAG AGC TCA GGT CAT TGA CGT AGC TAT GAA3 1 TSEO ID NO: 471 
Termination linker 1/2 (as): 

5 1 AGC TAC GTC AAT GAC CTG AG 3 1 TSEO ID NO : 4 8 1 

Termination linker I (short version): 
5'AAG AGA TGA A3 ' fSEO ID NO: 491 

Termination linker 2 (s) : 

5'ACC GCT CAG GTC ATT GAC GTA GCT TCA TT3 ' [ SEQ ID NO: 501 

Termination linker 2 (short version): 
5'ACC GTC ATT 3 f 
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METHODS OF CLONING AND PRODUCING FRAGMENT CHAINS WITH READABLE INFORMATION 



The present invention relates to new methods of 
attaching first and second nucleic acid molecules, 
5 particularly methods of cloning in which adapter 

molecules mediate the binding between the first and 
second molecules, the resultant nucleic acid molecules 
thus formed and methods of generating DNA with a readily 
readable information content and kits for performing 

10 such methods. 

Presently known cloning methods generally involve 
the use of restriction enzymes which are used to 
generate fragments for insertion and cleave vectors to 
produced corresponding and hence complementary terminal 

15 sequences. Generally, the enzymes which are used cut 
palindromic sequences and thus produce identical 
overhangs. Different sequences that are cut with the 
same restriction endonucleases can then be ligated 
together to form new, recombinant nucleic acids. 

2 0 However, such methods suffer from a number of 

limitations. One disadvantage in using endonucleases 
that form two identical overhangs is the formation of 
different products on ligation. If for example two 
fragments A and B are to be ligated, as a consequence of 

25 common overhangs the products A+A and B+B as well as the 
desired A+B will be produced. Other by-products 
resulting from other fragments produced when A and B 
were formed will also be generated, e.g. reassociation 
into the original positions. It is therefore normal to 

30 use a separation process using agarose gels. The 
separation procedure however often results in a 
considerable loss of DNA. 

Such methods necessarily suffer from various 
limitations including the by-products mentioned above, 

35 and the need to identify the desired end-products, e.g. 
if only a particular insert is to be cloned. 

Other cloning techniques have been used in which 
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cloning has been performed using PCR techniques, e.g. in 
which the PCR primers have IIS enzyme recognition sites. 
However, the use of PCR is disadvantageous in cloning 
techniques as it is time consuming and requires 
5 purification steps which result in significant loss of 
yield. The PCR reaction may also introduce point 
mutations and the like and the length of the fragment is 
limited to the polymerase capacity, e.g. a maximum of 
approximately 5 0kb . 

10 It has now surprisingly been found that by 

generating fragments with unique single stranded regions 
and then mediating the binding between a first and 
second nucleic acid molecule, many of these 
disadvantages may be avoided. In this method, 

15 restriction nucleases are used that form non- identical 
overhangs, e.g. type IP or IIS restriction 
endonucleases . As will be appreciated, if one uses a 
restriction endonuclease that makes overhangs of 4 base 
pairs, each fragment that is formed will have two 

2 0 overhangs of 4 base pairs each. It is theoretically 

possible therefore that 4 8 (ie. 65,536) fragments may be 
formed with different combinations of the two overhangs. 
Thus, as a rule, each fragment formed on cleavage will 
have a unique pair of overhangs even when cleaving large 
25 nucleic acid molecules. 

These unique overhangs may then be addressed and 
adjusted appropriately using adapters with two 
overhangs . For example in a cloning technique one of 
the overhangs is made to correspond to the overhang on 

3 0 the insert and the other overhang is made to correspond 

to the overhang on the vector into which the insert is 
to be introduced. This method is outlined in Figure 1. 
In that case the DNA molecule containing the insert is 
cut with a restriction endonuclease which makes an 
3 5 overhang on each side of the insert. Each of the many 

fragments which are formed have different overhangs such 
that the two overhangs at either end of the insert are 
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unique. Ligase is then added to bind two adapters with 
corresponding single stranded regions. This leads to 
the formation of two new overhangs at the termini of the 
insert, which are selected such that they can be used to 
5 bind to the vector into which the insert is to be 

cloned. Providing identical overhangs are not created 
on other molecules only the desired insert will be 
ligated to the adapters. In the final step the insert 
is ligated into the vector which has two overhangs which 
10 complement the adapters 1 overhangs. The overhangs in 

the vector may be constructed using the same principles 
as described for the insert . 

Thus in this new method, an adapter molecule is 
used which is complementary to a single stranded region 
15 generated on the first nucleic acid molecule and 

therefore binds to that molecule, but has a different 
single stranded region at its other terminus, thus 
effectively modifying the single stranded region 
presented for binding by the first nucleic acid molecule 
20 fragment. The adapter's free single stranded region may 
then mediate the binding of the first nucleic acid 
molecule fragment to a second nucleic acid molecule 
exhibiting a complementary single stranded region. 
. This method of mediation has particular 
25 applications for effectively identifying and selecting a 
first nucleic acid molecule fragment and then mediating 
its binding to a second nucleic acid molecule where this 
was not previously possible. 

Of particular relevance to methods of cloning is 
3 0 the generation of fragments for cloning which have 
different single stranded regions at their termini 
relative to other fragments, which may then be selected 
and cloned into an appropriate vector. As described 
herein, such fragments are generated by the use of 
3 5 enzymes which cleave outside their recognition site and 
thus produce overhangs that depend on the sequence 
surrounding the recognition site which is likely to vary 
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from fragment to fragment. 

Such techniques may be used to direct only a single 
fragment to a particular vector or may be used to direct 
different fragments to different sites or indeed 
different vectors, even within the same reaction mix, 
providing appropriate adapters are constructed. 

These methods have particular advantages over prior 
art methods. In particular, the whole procedure may be 
carried out in one or two steps, e.g. cutting and 
ligating simultaneously or cutting and ligating 
separately. Even in instances where the procedure is 
performed in two steps, it will often be possible to 
perform both steps in the same buffer, e.g. since T4 DNA 
ligase is known to work well in most buffers for 
restriction endonucleases . Time- and resource -consuming 
precipitation procedures may therefore be avoided. 
Moreover, ligations can be performed with overhangs of 
4-6 bases, unlike conventional cloning where overhangs 
of 0-4 bases are used, thereby increasing ligation 
efficiency considerably. 

Furthermore, the need to carry out gel separations 
may be avoided. The quantity of DNA required initially 
can be reduced substantially. Mutation of DNA molecules 
on UV exposure, a common occurrence in gel separation, 
may also be avoided. Furthermore, laboratory staff are 
not exposed to carcinogenic EtBr. Also, separation 
problems which can occur when restriction cleavage 
results in fragments of similar size may be avoided. 
The frequency of undesirable side-products such as empty 
vectors, too many inserts or incorrect orientation of 
the inserts may also be avoided. 

Since it is generally not problematic if the insert 
is cleaved, a small selection, e.g of type IIS or Ip 
restriction endonucleases could provide far more cloning 
possibilities than a corresponding selection of ordinary 
type II restriction endonuclease used for conventional 
cloning procedures. Having a few type IIS, IP and 
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similar restriction endonucleases that cleave with high 
frequency allows for many cloning possibilities. 

In the specific instance of cloning of large DNA 
molecules (e.g. genomic DNA) or a solution containing 
many different DNA molecules in parallel (e.g. a cDNA 
library) it is very difficult to use conventional 
methods. If for example a large DNA molecule is cleaved 
with EcoRI , a large number of fragments may be formed 
with the same overhang, and in addition a considerable 
proportion of these fragments may be of roughly the same 
size. This may lead to the formation of a large number 
of undesired ligation products, even with gel 
separation. Moreover, gel separation can be difficult 
if the insert is large. Furthermore, it is also often 
difficult, or even impossible, to find restriction 
endonucleases that will not cut large inserts. These 
problems may be reduced/eliminated using the cloning 
procedure described herein. 

If necessary, it is possible to increase the number 
of base pairs in the overhangs to (e.g.) 6 by using Cjel 
or similar endonucleases to form an even greater number 
of possible variables and thus increase the probability 
of producing unique overhangs . 

The advantages of the method of the invention are 
even greater in complex cloning procedures. If several 
adapters are used for example, it is possible to clone 
many different inserts into one and the same vector at a 
corresponding number of different sites in one and the 
same reaction, as described hereinafter in more detail. 

Deletions of small or large fragments may also be 
achieved using the same basic principle. This opens up 
the possibility of making complex recombinations of 
inter alia genomic DNA (removal of endogen viruses in 
genomes to be used for xenotransplantation, the 
insertion of a large number of genes from other genomes, 
new combinations of genes etc.) . The method can also be 
used for exon-shuf fling and other recombinations that 
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are relevant in connection with artificial evolutionary 
systems . 

Thus, in a first aspect, the present invention 
provides a method of attaching a fragment of a first 
nucleic acid molecule to a second nucleic acid molecule, 
wherein said method comprises at least the steps: 

1) cleaving said first nucleic acid molecule with a 
nuclease which has a cleavage site separate from its 
recognition site to create at least one fragment of said 
first nucleic acid molecule having a single stranded 
nucleotide region (SSla) at at least one terminus of 
said fragment, 

2) if necessary generating a single stranded 
nucleotide region (SS2) at at least one terminus of said 
second nucleic acid molecule, 

3) binding to at least one single stranded region of 
step 1) (SSla) an adapter molecule comprising at one 
terminus a single stranded region (SSA1) complementary 
to the single stranded region of said first nucleic acid 
molecule fragment (SSla) and additionally comprising at 
the other terminus a further single stranded region 
(SSA2) complementary to the single stranded region (SS2) 
at one terminus of said second nucleic acid molecule, 

4) ligating said adapter to said first nucleic acid 
fragment , 

5) binding said adapter to said second nucleic acid 
molecule, and 

6) ligating said adapter to said second nucleic acid 
molecule . 

As used herein, said first and second nucleic acid 
molecules are any naturally occurring or synthetic 
polynucleotide molecules, e.g. DNA, such as genomic or 
cDNA, PNA and their analogs, which are double stranded 
and in which single stranded regions may be generated. 

Fragments of the first nucleic acid molecule are 
generated by use of a nuclease which cleaves outside its 
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recognition site. One or more fragments may be 
generated depending on the sites which are cleaved (e.g. 
if the site is at the extreme end of the molecule only a 
few bases may be removed rather than the production of 2 
fragments) . Other nucleic acid molecule fragments 
described herein may be generated by any appropriate 
means, as mentioned herein, including the techniques 
used to produce the first nucleic acid molecule 
fragments. Fragments are preferably more than 10 bases, 
e.g. 10 to 2 0 0bp, preferably more than 100 bases in 
length. For cloning applications, fragments having 
lengths in excess of 200 bases, e.g. from 200 bases to 
2kb may be used. Where longer single stranded regions 
are generated, fragments of longer lengths are also 
contemplated, e.g. 10-100kb or longer. 

"Single stranded regions 1 ' as referred to herein are 
regions of overhang at the end, ie. at the terminus of 
the first, second or third nucleic acid molecules or 
adapter molecules. These regions are sufficient to 
allow specific binding of molecules having complementary 
single stranded regions and subsequent ligation between 
these molecules. Thus, the single stranded regions are 
at least 1 base in length, preferably 3 bases in length, 
but preferably at least 4 bases, e.g. from 4 to 10 
bases, e.g. 4, 5 or 6 bases in length. Single stranded 
regions up to 2 0 bases in length are contemplated which 
will allow the use of fragments in the method of the 
invention which are up to Mb in length. 

"Binding" as used herein refers to the step of 
association of complementary single stranded regions 
(ie. non-covalent binding). Subsequent "ligation" of 
the sequences achieves covalent binding. 

"Complementary" as used herein refers to specific 
base recognition via for example base-base 
complementarity. However, complementarity as referred 
to herein includes pairing of nucleotides in Watson- 
Crick base-pairing in addition to pairing of nucleoside 
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analogs, e.g. deoxyinosine which are capable of specific 
hybridization to the base in the nucleic acid molecules 
and other analogs which result in such specific 
hybridization, e.g. PNA, DNA and their analogs. 
Complementarity of one single stranded region to another 
is considered to be sufficient when, under the 
conditions used, specific binding is achieved. Thus in 
the case of long single stranded regions some lack of 
base-base specificity, e.g. mis-match, may be tolerated, 
e.g. if one base in a series of 10 bases is not 
complementary. Such slight mismatches which do not 
affect the ultimate binding and ligation of the single 
stranded regions are considered to be complementary for 
the purposes of this invention. The single stranded 
regions may retain portions, on binding, which remain 
single stranded, e.g. when overhangs of different sizes 
are employed or the complementary portions do not 
comprise all of the single stranded regions. In such 
cases, as mentioned above, providing binding can be 
achieved the single stranded regions are considered to 
be complementary. In those cases, prior to ligation, 
missing bases may be filled in e.g. using Klenow 
fragment, or other appropriate techniques as necessary. 

"Adapters" as referred to herein are molecules 
which adapt the first nucleic acid molecule fragment for 
binding to a second or third nucleic acid molecule. 
Adapter molecules comprise at least two regions . A 
first portion containing a single stranded region which 
is complementary to the single stranded region on the 
first nucleic acid molecule fragment and a second 
portion containing a single stranded region which is 
complementary to the single stranded region on the 
second nucleic acid molecule. The single stranded 
regions are as described hereinbefore and are preferably 
on different strands making up the adapter molecule. 
The above mentioned portions are at least as large as 
the single stranded regions, e.g. 4 to 6 bases in 
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length, although they may be longer, e.g. up to 2 0 bases 
in length. 

A linking region between these single stranded 
regions is required for the stability of the molecule. 
5 Conveniently this comprises a double stranded nucleic 

acid fragment, especially in methods of cloning where 
amplification, replication and/or translation are to be 
performed. However, this portion may be substituted by 
any appropriate molecule depending on the end use of the 

10 resulting ligated molecule. Clearly, to achieve 

ligation between the first and second nucleic acid 
molecules appropriate attachment points and moieties for 
ligation must be provided. 

The linking portion may serve more than just a 

15 linking function and may for example provide sequences 

appropriate for primer or probe binding, e.g. for 
amplification or identification, respectively, or may 
contain integration sites for mobile elements such as 
transposons and the like. Depending on how the method 

2 0 is performed, the adapters preferably do not contain 

restriction sites for any restriction enzymes used in 
the method of the invention thus avoiding the need to 
inactivate or remove the enzymes prior to the addition 
of the adapters . 
25 Conveniently adapter molecules may be exclusively 

comprised of a nucleic acid molecule in which the 
various properties of the adapter are provided by the 
different regions of the adapter . 

Conveniently adapters are made up of two 

3 0 complementary oligonucleotides having between 10 and 100 

bases each, e.g. between 20 and 50 bases. 

In the method described above, preferably at least 
one first nucleic molecule fragment is generated having 
a single stranded region at either end (SSla and SSlb) 
3 5 to each of which an adapter binds. 

Preferably the method described herein is used for 
cloning. Thus, in the method described above, an 
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adapter is bound at either end of the first nucleic acid 
molecule fragment (in which the adapters may be the same 
of different) , and the unbound end of the first adapter 
is bound to the second nucleic acid molecule and the 
5 unbound end of the second adapter binds either to the 
second nucleic acid molecule (ie. at the other end 
distal to the binding of the first adapter, thereby 
forming a circular molecule) or binds to a third nucleic 
acid molecule. The first of these two alternatives may 
10 arise through cleavage of a circular vector to give rise 
to the second nucleic acid molecule to which the 
[adapter 1] : [first nucleic acid molecule 

fragment] : [adapter 2] insert is bound to re - circularize 
the vector. Alternatively, a linear or circular vector 

15 may be cleaved giving rise to two or more discrete 
fragments (herein the second and third nucleic acid 
molecules) which may be joined by the adapter lrfirst 
nucleic acid molecule : adapter 2. 

Thus, in a preferred feature, a first nucleic acid 

2 0 molecule fragment is generated which has a single 

stranded nucleotide region at either terminus (SSla and 
SSlb) , each of which is bound by an adapter, which may 
be the same or different, and the first of said adapters 
is bound to said second nucleic acid molecule and the 

2 5 second of said adapters binds either to said second 

nucleic acid molecule or to a third nucleic acid 
molecule . 

Thus, alternatively stated, in a preferred 
embodiment, the present invention provides a method of 

3 0 cloning a fragment of a first nucleic acid molecule into 

a second nucleic acid molecule, wherein said method 
comprises at least the steps : 

1) cleaving said first nucleic acid molecule with a 
nuclease which has a cleavage site separate from its 
3 5 recognition site to create one or more fragments of said 

first nucleic acid molecule, wherein at least one 
fragment has a single stranded nucleotide region at both 
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termini (SSla and SSlb) , 

2) cleaving said second nucleic acid molecule to 
create at least two single stranded regions (SS2a and 
SS2b) at the site of said cleavage (e.g. linearizing a 
circular vector or producing fragments in a linear or 
circular vector) , 

3) binding to one of the single stranded regions of 
step 1) (SSla) 

a first adapter molecule comprising at one terminus 
a single stranded region (SSA1) complementary to 
the single stranded region of said first nucleic 
acid molecule fragment (SSla) and additionally 
comprising at the other terminus a further single 
stranded region (SSA2) complementary to one of the 
single stranded regions (SS2a) produced by cleavage 
of said second nucleic acid molecule, and 
binding to a second single stranded region of step 1) 
(SSlb) 

a second adapter molecule as defined above which 
binds to the second single stranded region of said 
first nucleic acid molecule fragment (SSlb) and to 
the second single stranded region (SS2b) produced 
by cleavage of said second nucleic acid molecule, 

4) ligating said adapters to said first nucleic acid 
fragment , 

5) binding said, adapters to said second nucleic acid 
molecule or fragments thereof, and 

6) ligating said adapters to said second nucleic acid 
molecule or fragments thereof . 

In instances in which cleavage of the second 
nucleic acid molecule results in the production of two 
or more discrete fragments which become ligated to the 
first nucleic acid molecule fragment via the adapters, 
said fragments constitute second and third nucleic acid 
molecules of the invention. 

Preferably, to prevent concatermirisat ion of 
[adapter : first nucleic acid fragment : adapter] units, the 
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single stranded region of the second and third nucleic 
acid molecules which bind to these adapters are not 
complementary. Thus, for example, where cloning into a 
vector is performed, preferably said vector is 
5 linearized and at least of portion of said vector is 

removed from one terminus of that vector, e.g. at least 
two cleavage events occur. 

In such methods, particularly for cloning, the 
second nucleic acid molecule, e.g. into which a first 

10 nucleic acid molecule fragment is inserted is 

conveniently a vector (or a part thereof, e.g. where the 
second and third nucleic acid molecules together 
comprise the vector, and result through its cleavage) . 
Such vectors include any double stranded nucleic acid 

15 molecule which may be linear or circular. (However, as 
mentioned above in respect of the adapters, providing 
single stranded regions exist, or are generated at the 
termini of the second nucleic acid or its fragments 
(e.g. the vector), the adjacent regions may be made up 

20 of any molecule providing ligation at the termini to the 
adapters is not compromised.) 

Conveniently such vectors may contain sequences 
which aid their use in methods of the invention or their 
subsequent manipulation. Thus, vectors are conveniently 

25 selected with only two or a small number of restriction 

cleavage sites for the method of cleavage used. Thus 
for example where restriction enzymes are used, the 
vector is selected to include only a minimal number, 
preferably only two recognition sites to that enzyme. 

30 Vectors may additionally comprise further portions 

or sequences for cloning, selection, amplification, 
transcription or translation as appropriate. Thus 
vectors may be used with probe or primer sites, promoter 
regions, other regulatory regions, e.g. expression 

35 control sequences etc. Conveniently well-known cloning 
vectors are employed, such as pBR322 and derived 
vectors, pUC vectors such as pUC19, lambda vectors, BAC, 
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YAC and MAC vectors and other appropriate plasmids or 
viral vectors . 

The molecule of which a fragment is to be inserted, 
ie . the first nucleic acid molecule, may be any molecule 
5 which can generate single stranded regions at at least 
one of its ends using the nucleases described herein, 
although the central portion may be varied as 
appropriate . Preferably however such molecules are 
double stranded nucleic acid molecules and contain 

10 appropriate sites for the use of enzymes to create the 
single stranded overhangs which are required in 
accordance with the invention. Appropriately, the first 
nucleic acid molecule is derived from genomic DNA and 
the method of the invention is used to insert fragments 

15 thereof into appropriate vectors . 

Adapters which may be used include short double 
stranded nucleic acid molecules with single stranded 
regions at their termini to longer molecules which may 
contain further sequences for example to allow selection 

20 as described hereinafter. Appropriate single stranded 
regions are selected on the basis of the terminal 
sequence of the first, second and third nucleic acid 
molecules or fragments thereof. Appropriate selection 
may also be used to direct the orientation of the 

25 insert, e.g. to produce clones which may be used to 
produce antisense nucleic acid molecules. 

Adapters may be used in the methods of the 
invention in which their single stranded overhangs have 
already been generated, e.g. by the combination of 

30 single stranded complementary oligonucleotides which on 
hybridization leave overhangs at either ends, or by 
appropriate cleavage or digestion . 

Alternatively, during the method of the invention, 
adapters may be modified to provide single stranded 

35 portions, e.g. by the use of restriction enzymes or 

other appropriate techniques during the course of the 
reaction. Conveniently, to simplify the number of 



WO 01/00816 PCT/GB00/02512 

- 14 - 

steps, the enzymes used to generate single stranded 
regions in the first, second or third nucleic acid 
molecules (where necessary) may be used to generate the 
adapter single stranded regions. 
5 As mentioned previously, the single stranded region 

may be 4 or more bases in length. When using longer 
overhangs or where the sequence of the full 
corresponding single stranded region of the first, 
second or third nucleic acid molecules is not known or 

10 unclear, a family of adapters with one or more 

degenerate bases in the single stranded region may be 
used, for example using methods to create libraries of 
adapters. Degenerate bases may also be used at 
positions prone to mis-match ligations. 

15 For convenience a universal library of adapters may 

be created for use in the method of the invention. Thus 
for example, 16 different adapters with a 4 base-pair 
overhang consisting of two random bases (NN) and two 
bases specific to each adapter (e.g. AA, CC,...TT) may 

20 be created. In this way sufficient adapters may be 

created which are capable of distinguishing between 16 
different first molecule fragment overhangs, which would 
suffice for many cloning purposes. Similarly a library 
of second molecule, e.g. vector overhangs may be 

25 created. 

To increase the number of permutations in an 
adapter library, two separate oligonucleotide libraries 
may be generated, one with single stranded 
oligonucleotides with regions that will correspond to 

30 the single stranded region of the first nucleic acid 
molecule fragment and the second library with single 
stranded oligonucleotides with regions that will 
correspond to the single stranded region of the second 
nucleic acid molecule (e.g. vector) . However in common 

3 5 in each member of the library is a complementary region, 

such that when one member from the first library is 
selected and combined with a member of the second 
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library, they will hybridize leaving free the relevant 
single stranded regions . Thus for example to generate 
an adapter with an AA overhang and a TC overhang to bind 
to the first and second nucleic acid molecules 
5 respectively, members of the different libraries such as 

GGGCC CCNNAA may be combined with TCNNNCCGGGG to form: 
GGCCCCCNNAA, 
TCNNNCCGGGG 

which exhibits the appropriate overhangs. When using 

10 only two 16 member libraries this allows the production 

of 256 different adapters. 

In generating appropriate adapters conveniently the 
amount of mis -match which needs to be tolerated when 
binding to overhangs on first, second and/or third 

15 nucleic acid molecules should be reduced. This may 

conveniently be achieved by selecting oligonucleotides 
on the basis of the probability of a mismatch ligation 
being generated. A computer program for achieving this 
is described in more detail in Example 6 . This method 

20 allows sets of oligonucleotides to be identified which 
can be used to construct chains with more than 10 0 
fragments in a single ligation cycle but with very low 
levels of mis -match. Thus in a further feature the 
present invention provides computer software adapted to 

25 identify adapter molecules for use in the method of the 

invention . 

As mentioned above, the production of fragments of 
said first nucleic acid molecule is achieved using a 
nuclease which has a cleavage site separate from its 

3 0 recognition site. In so doing, unique overhangs are 

created which reflect the sequence of that molecule. In 
a preferred feature, said nuclease is a class IP or IIS 
restriction enzyme or functional derivatives thereof. 
Such enzymes include enzymes produced synthetically 

3 5 through the fusion of appropriate domains to arrive at 
enzymes which cleave at a site distal to their 
recognition site. 
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These enzymes exhibit no specificity to the 
sequence that is cut and they can therefore generate 
overhangs with all types of base compositions. Cleavage 
with IIS enzymes result in overhangs of various lengths, 
5 e.g. from -5 to +6 bases in length. Preferably for 
performing the method of the invention, enzymes are 
chosen which generate 3-6, e.g. 4 base pair overhangs. 
Preferred enzymes for use in the invention include 
enzymes which produce 4 base overhangs at the 3 1 end: 
10 BstXI; 5 base overhangs at the 3' end: Alol , Bael , Bp J I , 
Bsp24I; 6 base overhangs at the 3 » end : Cjel, CjePI, 
ffaelV; 4 base overhangs at the 5 ' end: Acelll , Acc3 6I , 
Alw26I, AlwXX, Bbrll, BJbsI , BJbvI , BJbvII, BvJbl6II, 
Bli736I, Bpil, BpuAI , Bsal , Bsc91I, BseKI , BseXI , BsmAl , 
15 BsiriBI , BsmFX, Bso31I, Bsp423I, BspBS31I, BspIS4I, 

BspLUllIII, BspMI, BspST5I, BspTS514I, Bstl2I, Bst71I, 
BstBS32I, BstGZ53I, BstTSSI, BstOZ616I, BstPZ418I, 
Bco31I, ScoA41, £co044I, Esp31 , FokX , Phal , Sfa.Nl , 
£t22l32I, StsI; and 5 base overhangs at the 5 1 end: Hgral 
20 Over 100 classes of IIS restriction endonucleases 

have been identified and there are large variations both 
with respect to substrate specificity and cleaving 
pattern. In addition, these enzymes have proved to be 
well suited to "module swapping" experiments so that one 
25 can create new enzymes for particular requirements 

(Huang-B, et al . ; J-Protein-Chem. 1996, 15(5):481-9, 
Bickle, T.A.; 1993 in Nucleases (2nd edn) , Kim-YG et 
al.;PNAS 1994, 91:883-887). In these experiments the 
binding domain of transcription factor Spl was merged 
3 0 with the cleavage domain of Fokl to construct a class 
IIS restriction endonuclease that makes a 4 -base 
overhang with Sp± sites. In other experiments a class 
IIS restriction endonuclease that cuts outside the 
binding sites of transcription factor Ultrabithorax was 
3 5 generated. Corresponding experiments have been 

conducted on class I enzymes. By merging the N- terminal 
part of the hsdS sub-unit of StyR 1241 (which recognizes 
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GAAN 6 RTCG) with the C-terminal part of the hsdS sub-unit 
of StyR 1241 (which recognizes TCAN 7 RTTC) a new enzyme 
that recognizes the sequence GAAN 6 RTTC was constructed. 
Several other experiments have been carried out with 
5 similar success. Unlike in the case of ordinary class 
II enzymes, it is therefore reasonable to assume that a 
number of new IIS and IP restriction enzymes can be 
constructed and adapted to cloning requirements that may 
arise in the future. Very many combinations and 

10 variants of these enzymes can therefore be used 
according to the principles described herein. 

Generation of the single stranded regions on said 
first nucleic acid fragment may be achieved directly by 
cleavage of said first nucleic acid molecule with 

15 nucleases described herein without the development of 

intermediate molecules. This forms a preferred feature 
of the invention. Alternatively, indirect and more 
elaborate techniques may be used. For example, the 
first nucleic acid molecule or a fragment thereof may be 

20 "trimmed" using the nucleases described herein, in which 

linker molecules which carry the nuclease recognition 
site are bound to the first nucleic acid molecule or 
fragment thereof, and cleavage outside the recognition 
site results in cleavage within the first nucleic acid 

25 molecule or fragment thereof. This method is 

particularly useful since it takes advantage of the fact 
that T4 DNA ligase (and also other ligases) works well 
in most buffers used for restriction cutting. Ligation 
and cleavage can therefore be performed simultaneously 

30 in the same solution. Furthermore, this methods allows 
the generation of a unique overhang when the overhang 
generated by the first cleavage step is not unique. 

The trimming procedure may be initiated using an 
"initiation linker" that is addressed to an overhang on 

35 the first nucleic acid molecule or fragment thereof, 
e.g. after cleavage with one or more restriction 
endonucleases as described herein. As used herein, a 
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"linker" refers to a molecule which is similar to an 
"adapter" as described herein, except that the linker 
need only contain one single stranded region to allow 
binding to the molecule to be trimmed. Furthermore, the 
initiation linker contains one or more cleavage sites 
for nucleases that cleave outside their own recognition 
sequence, as described herein, for example BplI . The 
first nucleic acid molecule or fragment thereof should 
preferentially not contain cleavage sites for the IIS 
enzymes (s) used for the trimming procedure. Such 
cleavage sites may alternatively be inactivated prior to 
the trimming procedure (e.g. by methylation) . 

Propagation linkers (if used) and a termination 
linker (wherein the latter may be an adapter as 
described herein), T4 DNA ligase and the IIS enzyme (s) 
used for the trimming may be added together with the 
initiation linker. Once the initiation linker has been 
ligated into position, cleavage may be effected 
resulting in the generation of an overhang within the 
first nucleic acid molecule or fragment thereof. If 
desired (ie. if further trimming is required), a 
propagation linker containing degenerate overhangs may 
be used to ligate with the overhang which has been 
generated. Since the linker will also carry an 
appropriate nuclease recognition site, cleavage will 
again produce a further cleavage site further upstream 
into the first nucleic acid molecule or fragment 
thereof. This process will continue until an overhang 
is generated that is complementary to one of the 
overhangs in the termination linker (or adapter as 
described herein) . This final linker will not itself 
have the nuclease recognition site and will therefore 
terminate trimming. As mentioned previously, this 
terminator linker may have an appropriate single 
stranded region for binding to the adapter used in the 
next step, or may itself be the adapter. An appropriate 
technique for performing the trimming method may be 
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found in Examples 4 and 9. 

The trimming method is preferably not performed 
with I IS enzymes belonging to the Bcgl class (e.g. Bpll , 
Bael etc.) as the proteins are combined methylases and 
endonucleases and the methylase function may inactivate 
the binding sites on propagation linkers. Enzymes 
including Fokl , HgaX etc. are therefore preferred 
enzymes for performing this method. If Bcgl class 
enzymes are to be used, the cof actor AdoMet should be 
replaced with AdoHcy, Sinefungine or other cofactors 
that can not function as methyl donors. 

Thus in a preferred feature the invention provides 
a method of removing the end terminus of a double 
stranded nucleic acid molecule with at least one single 
stranded region, comprising at least the steps of (i) 
binding (ie. ligated) a double stranded linker molecule 
containing a recognition site for a nuclease which 
cleaves outside its recognition site and a single 
stranded region complementary to the single stranded 
region on said double stranded nucleic acid molecule to 
said molecule and cleaving using said nuclease, thereby 
resulting in removal of one or more bases (e.g. 3-10, 
which may be in single or double stranded form, or a 
combination thereof) from the terminus of said nucleic 
acid molecule, (ii) optionally binding one or more 
propagation linkers which contain a recognition for a 
nuclease as described above and a degenerate single 
stranded region which binds to the overhang generated by 
the first or subsequent cleavage steps and cleaving 
using said nuclease, and (iii) adding a termination 
linker which binds to the single stranded region 
generated in steps i or ii . 

A similar technique may be used to remove unwanted 
sequences, e.g. contributed by the adapter after 
ligation of the first nucleic acid molecule fragment and 
second (or third) nucleic acid molecules. Various 
techniques may be used to remove the unwanted sequences, 
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e.g. if the sequence (e.g. a region from the adapter) 
contains a plant transposon sequence, this may be 
removed by adding necessary transposase enzymes to 
excise that sequence. Alternatively, the unwanted 
sequence may be removed by taking advantage of nuclease 
that cleave outside their recognition site. Thus, for 
example, adapters may be used which contain recognition 
sites for such enzymes which on cleavage (by appropriate 
selection of cleavage site sequences) , result in 
overhangs generated at two distinct cleavage sites which 
are complementary and thus allow concomitant excision of 
the intervening sequence . Examples of techniques for 
removing intervening sequences are shown in Example 
5. It will be appreciated that depending on the 
nuclease employed, it may be necessary to inactivate 
sites for that enzyme at locations other than adjacent 
to or within the intervening sequence. 

Thus, in a further preferred feature, adapters as 
used herein, additionally comprise one or more nuclease 
recognition and cleavage sites whereby arrangement of 
said sequences allows, on cleavage, generation of 
complementary single stranded regions wherein each one 
of said pair of single stranded regions is generated by 
cleavage at a distinct site. 

Depending on how the different steps in the method 
of the invention are performed, as described 
hereinafter, where necessary the second nucleic acid 
molecule, and/or the adapters may also be cleaved or 
digested to provide appropriate single stranded regions. 
In a preferred feature, the second nucleic acid molecule 
and/or the adapters are cleaved using the nucleases 
described above for generating the first nucleic acid 
molecule fragments. However, instead of cleavage with 
such nucleases, to generate appropriate single stranded 
regions and/or fragments from the second or third 
nucleic acid molecules or adapters, alternative 
techniques may be used. Thus for example other 
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restriction enzymes, non-specific nucleases or 
appropriate exonucleases or mechanical methods such as 
sonication or vortexing may be used. Where enzymes are 
employed, small volumes are preferably used during the 
reactions to increase efficiency. 

Ligation between the adapters and first, second and 
third nucleic acid molecules is achieved by any 
appropriate technique known in the art (see for example, 
Sambrook et al . , in "Molecular Cloning: A Laboratory 
Manual", 2nd Ed., Editor Chris Nolan, Cold Spring Harbor 
Laboratory Press, 1989) . For example, ligation may be 
achieved chemically or by use of appropriate naturally 
occurring ligases or variants thereof. Appropriate 
ligases which may be used include T4 DNA ligase, and 
thermostable ligases, such as Pfu, Taq, and TTH DNA 
ligase. Ligation may be prevented or allowed by 
controlling the phosphorylation state of the terminal 
bases e.g. by appropriate use of kinases or 
phosphatases. Appropriately large volumes may also be 
used to avoid intermolecular ligations. Thus, high 
adapter to vector/insert ratios may be used to avoid the 
vector or insert religating into its source material. 

Other techniques may be used to avoid or remove 
vectors which become religated or which do not cleave. 
For example the insert may be cloned into a selection 
marker that destroys the host bacteria unless it has 
been inactivated by the insert. Alternatively 
restriction cleaving using restriction enzymes specific 
for the fragment removed from the vector may be 
performed after the ligation step. Religated and 
uncleaved vectors would be cleaved in this step. Thus, 
the ideal cloning site is therefore one which contains 
many unique restriction sites that are removed upon 
insert ligation. Alternatively well-known techniques 
may be used for identifying the desired product, e.g. 
gel separation. 

If the steps of cleavage and ligation are performed 
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together, advantageously the insert and the vector into 
which it is inserted do not contain binding sites for 
the nuclease used. Similarly, it is advantageous if the 
fragment removed from the vector during the process of 
cloning contains binding sites for the nuclease. In 
that case, if that fragment religates with the vector it 
would be cleaved and thereby removed again. 

Once the first and second nucleic acid molecules 
(and optionally third nucleic acid molecules) or 
fragments thereof have been covalently attached, where 
necessary selection of appropriate products from any 
side-products may be performed. Selection may be 
performed by any techniques known in the art . 
Conveniently however, labelled probes may be used to 
identify sequences present only in the correct product, 
e.g. by probing for one or more sequences formed only 
through the union of the correct sequences, e.g. a probe 
directed to the junction between the adapter and the 
first, second or third nucleic acid sequences. 
Alternatively, the correct ligation may be detected by 
functional properties bestowed on the product through 
ligation, e.g. through the completion of sequences which 
allow expression of a particular product once the vector 
has been cloned into an appropriate host. 

Alternatively, selection may be performed by sequencing 
of the products which have been obtained, e.g. after 
amplification and/or transformation . 

Appropriate labels include any moieties which 
directly or indirectly allow detection and/or 
determination through the generation of a signal. 
Although many appropriate examples exist, examples 
include for example radiolabels, chemical labels (e.g. 
EtBr, TOTO, YOYO and other dyes) , chromophores or 
fluorophores (e.g. dyes such as fluorescein and 
rhodamine) , or reagents of high electron density such as 
ferritin, haemocyanin or colloidal gold. Alternatively, 
the label may be an enzyme, for example peroxidase or 
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alkal ine phosphatase, wherein the presence of the enzyme 
is visualized by its interaction with a suitable entity, 
for example a substrate . 

As mentioned previously, one of the significant 
5 advantages which this method offers over known methods 
is the simplification of the techniques which are 
required. The steps described herein may be performed 
sequentially in separate tubes (e.g. when different 
enzymes are used and cross-reaction is undesirable) or 

10 in a limited number of steps. However, ideally, the 

reaction is performed in a single step. This can be 
achieved by appropriate selection of enzymes, adapters 
and second/third nucleic acid molecules, e.g. vectors. 
Thus for example the first nucleic acid molecule 

15 may be fragmented using a particular nuclease which is 
also used to fragment the second nucleic acid molecule. 
Since the enzyme used will cleave outside its 
recognition site, it would be expected that the 
resulting single stranded regions found on both the 

20 first and second nucleic acid molecule fragments will be 
unrelated. However, by appropriate choice of the 
mediating adapters (which may also be added providing 
they do not have restriction sites for that enzyme, or 
that cleavage at those sites reveals appropriate single 

25 stranded regions) , these unrelated sequences may be 

linked via the intermediacy of the adapters. Thus the 
entire reaction may be performed in a single step. 

It will also be appreciated that the adapters may 
be used to address the first nucleic acid fragments to 

3 0 different second nucleic acid fragments or cleavage 
sites. This would therefore allow different first 
nucleic acid molecule fragments to be directed and 
ligated to a particular vector or site within a vector. 
Thus multiple vectors (and corresponding appropriate 

3 5 adapters) may be used simultaneously and take up a 
single first nucleic acid molecule fragment. 

Alternatively, multiple fragments or copies of the 
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same fragment could be inserted at different sites 
within the same vector (in the latter case by the use of 
adapters with one common end but with the other end 
exhibiting variability to allow it to bind to different 
sites within the vector) . In a further alternative, the 
first nucleic acid molecule fragments could be captured 
in the reverse orientation (again by appropriate adapter 
choice) and inserted into a vector, e.g. to produce 
antisense strands. 

Thus in a preferred embodiment the method described 
herein is performed in a single step. The ligation 
steps (ie. adapter to first nucleic acid molecule 
fragment and final ligation) may however be conducted 
separately once association of the relevant molecules 
has been achieved. In a further preferred embodiment, 
the invention provides a method of simultaneously 
attaching two or more fragments of the first nucleic 
acid molecule to different second nucleic acid molecules 
(or different termini thereof) . In cloning, this 
equates to the introducing of the two or more fragments 
into different sites in said second nucleic acid 
molecules or into different second nucleic acid 
molecules, e.g. into different sites within a vector or 
into different vectors. 

Thus the present invention provides methods of the 
invention in which two or more fragments of the first 
nucleic acid molecule are attached to different second 
and optionally third nucleic acid molecules, or 
different termini thereof. In a preferred feature, 
methods are provided wherein one or more fragments of 
said first nucleic acid molecule are attached via 
adapters to single stranded regions in said second 
nucleic acid molecule resulting from different cleavage 
events. As a further preferred feature, methods are 
provided wherein one or more fragments of said. first 
nucleic acid molecule are attached via adapters to 
single stranded regions in two or more second nucleic 
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acid molecules. 

It will be appreciated that even more complex 
reactions may be envisaged in which multiple first 
nucleic acid molecules (e.g. 2 or more, e.g. 2-10) are 
simultaneously cleaved in the same reaction and their 
fragments bound to appropriate adapters which direct 
them to bind to different second nucleic acid molecules, 
e.g. different vectors or sites in vectors. 

Whilst the above described methods describe an 
especially simplified method, the above described 
effects may also be achieved by performing the method in 
discrete steps. This is particularly appropriate where 
different enzymes are used which would produce 
undesirable products in other molecules. Thus for 
example, different nuclease, such as restriction enzymes 
may be used to cleave the first and second nucleic acid 
molecules. In such cases, the molecules are cleaved 
separately, whereafter the enzymes are removed or 
inactivated before the fragments are mixed together with 
the adapters. Similarly, even if the same enzyme is 
used, if the adapters contain enzyme sensitive sites, 
the adapters could be appropriately modified to avoid 
reaction, e.g. by methylation, or the enzymes used to 
fragment the first and/or second nucleic acid molecules 
would be inactivated or removed (as mentioned above) 
prior to the addition of the adapters. 

Conveniently, inactivation of enzymes may be 
achieved by incubation at at least 65°C, e.g. for 20 
minutes. Alternatively, appropriate techniques 
employing removal of the enzymes from the reaction, use 
of chelators, inhibitors etc . may be used to achieve 
inactivation . 

Once appropriate clones have been generated and 
selected these may be treated according to standard 
methods of amplification, transformation, replication, 
expression, sequencing, depending on the proposed 
application of the clones. Other aspects of the 
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invention thus include the nucleic acid molecule product 
of the method (ie. the nucleic acid molecule that is the 
[first nucleic acid molecule fragment] : [adapter] : [second 
nucleic acid molecule] product) , such as cloning and 
5 expression vectors comprising that nucleic acid molecule 

product as well as transformed or transfected 
prokaryotic or eukaryotic host cells, or transgenic 
organisms containing a nucleic acid molecule produced 
according to the method of the invention. 

10 Appropriate expression vectors include appropriate 

control sequences such as for example translat ional 
(e.g. start and stop condon, ribosomal binding sites) 
and transcriptional control elements (e.g. promoter- 
operator regions, termination stop sequences) linked in 

15 matching reading frame with the nucleic acid molecules 

of the invention . Appropriate expression systems are 
well known and documented in the art as well as methods 
for their introduction and expression in prokaryotic or 
eukaryotic cells or germ line or somatic cells to form 

20 transgenic animals. Appropriate expression vectors for 

transformation include bacteriophages and viruses, such 
as baculovirus, adenovirus and vaccinia viruses. 

Kits for performing the methods described herein 
form a preferred aspect of the invention. Thus viewed 

25 from a further aspect the present invention provides a 

kit for attaching a first nucleic acid molecule fragment 
to a second nucleic acid molecule or a fragment thereof 
comprising at least ( i ) one or more adapters as 
described hereinbefore or means for producing such 

30 adapters, (ii) the second nucleic acid molecule and 

(iii) a nuclease which cleaves outside its recognition 
site, wherein the terminus of one of said adapters has a 
single stranded region complementary to a single 
stranded region generated on said second nucleic acid 

35 molecule after cleavage with said nuclease. 

Preferably said kit comprises a library of 
oligonucleotides, e.g. as described herein, particularly 



WO 01/00816 PCT/GB00/02512 

- 27 - 

as described in Example 3, from which appropriate 
adapters may be generated. The library of 
oligonucleotides as described herein forms a further 
preferred feature of the invention. Thus for example 
5 said library may comprise a plurality of 

oligonucleotides comprising 1) a plurality of 
oligonucleotides of the formula XNNNNN wherein X is one 
or more bases (wherein said bases are as described 
hereinbefore) and is invariant in all of said 

10 oligonucleotides and each N is a base at the 5 ' end 

which is varied in the different oligonucleotides, ie . 
to produce 1024 . variants , 2) a plurality of 
oligonucleotides of the formula X 1 NNNN wherein X' is 
complementary to X and is invariant in all of said 

15 oligonucleotides and each N is a base at the 5 1 end as 
described hereinbefore, 3) a plurality of 

oligonucleotides of the formula YNNNNN wherein Y, which 
is not the same as X, is one or more bases (wherein said 
bases are as described hereinbefore) and is invariant in 

20 all of said oligonucleotides and each N is a base at the 
3' end as described hereinbefore, and 4) a plurality of 
oligonucleotides of the formula Y 1 NNNNNN wherein Y' is 
complementary to Y and is invariant in all of said 
oligonucleotides and each N is a base at the 3 ' as 

25 described hereinbefore. 

Optionally the kit may contain other appropriate 
components selected from the list including ligases , 
enzymes necessary for inactivation and activation of 
restriction or ligation sites, primers for amplification 

3 0 and/or appropriate enzymes, buffers and solutions, and a 
data carrier containing a computer program to assist in 
the selection of oligonucleotides from the above 
mentioned library. The use of such kits for performing 
the method of the invention form further aspects of the 

3 5 invention. 

The above described method may be adapted to 
combine multiple first, second, third etc. nucleic acid 
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molecules as described below. In this method multiple 
fragments are combined by appropriate selection of the 
single stranded regions which appear at their ends. 
This has application in the production of specific 
5 sequences for biological purposes, but has particular 

utility in the production of nucleic acid molecule 
chains in which the units making up the chains each 
denotes a unit of information, ie. the chains may be 
used to store information, as will be described in more 

10 detail below. As used herein "chain" refers to a serial 

arrangement of' fragments as described herein. Such 
chains are preferably linear and include branched and 
unbranched fragment sequences. Thus, for example, 
branched DNA fragments may be used to provide chains 

15 with a branched arrangement of fragments. 

To produce nucleic acid molecule chains with 
different unit fragments, ie . fragment chains the 
following method may be used. Firstly it is necessary 
to generate fragments which have overhangs at either 

20 end, to allow them to bind to one another. (The 
ultimate 3 1 and 5 ' fragments may however have an 
overhang at only the end which will become attached to 
internal fragments.) As will be described in more 
details below, for certain applications appropriate 

25 oligonucleotides may be derived from libraries in which 
the members exhibit variability in at least some of 
their bases. If libraries are to be produced in which 
the members are double stranded, it will be appreciated 
that the number of members in such a library could be 

3 0 rather high. This can however effectively be reduced by 
using a smaller number of smaller building blocks. 

One strategy is to make two single - stranded 
oligonucleotides using conventional techniques. In the 
example described above (6 base double stranded linker 

35 and 3 base overhangs at either end), oligonucleotides 
having a region of 6 bases which complement each other 
and so allow hybridization may be used. Since not all 
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of the molecules are involved in the hybridization, 
single stranded regions extend beyond the hybridizing 
region thus creating single stranded regions. 
Conveniently the number of required library members may 
5 be reduced even further if repeat sequences appear with 
frequency in the fragment chain. This will be described 
in more detail below. 

Once the appropriate double stranded chain units 
(ie. fragments) have been created they may be ligated 

10 together in the same solution, providing the different 
overhangs present on the sequences are unique. 

Thus in a further aspect, the present invention 
provides a method of synthesizing a double stranded 
nucleic acid molecule comprising at least the steps of: 

15 1) generating n double stranded nucleic acid 

fragments, wherein at least n-2 fragments have single 
stranded regions at both termini and 2 fragments have 
single stranded regions at at least one terminus, 
wherein (n-1) single stranded regions are complementary 

20 to (n-1) other single stranded regions, thereby 
producing (n-1) complementary pairs, 

2) contacting said n double stranded nucleic acid 
fragments, simultaneously or consecutively, to effect 
binding of said complementary pairs of single stranded 

2 5 regions, and 

3) optionally ligating said complementary pairs 
simultaneously or consecutively to produce a nucleic 
acid molecule consisting of n fragments. 

The terms "nucleic acid molecule", "single stranded 
30 regions", "complementary", "binding" and "ligating" are 
as described hereinbefore. 

In step 1) reference is made to (n-1) single 
stranded regions complementary to (n-1) "other" single 
stranded regions. This describes two families of single 
35 stranded regions, which together comprise 2 (n-1) 

members, forming n-1 pairs. Thus "other" refers to 
single stranded regions in the second family which are 
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"Contacting" as used herein refers to bring 
together the double stranded fragments under conditions 
which are conducive to association of the complementary 
5 single stranded regions. Depending on the method used, 

this may ultimately allow ligation of the fragments 
carrying those regions. It should however be noted that 
the fragments may be linked by methods other than 
ligation. For example PCR may be used with appropriate 

10 primers, e.g. pairs of primers. 

Simultaneous or consecutive contacting and/or 
ligation refers to the possibility of adding the 
fragments individually or in groups to a growing chain 
or simultaneously adding all n fragments together, 

15 wherein ligation may be performed after each addition or 
once all n fragments have been combined. Preferably 
ligation is effected once all fragments have been 
combined . 

" Fragments " as used herein are as defined herein 

2 0 before, but preferably are shorter in length. Thus 

fragments are preferably greater than 6 bases in length 
(wherein said length refers to the length of each single 
stranded oligonucleotide making up the fragment which 
may differ slightly in length from one another), e.g. 
25 between 6 and 50 bases, e.g. from 8 to 25 bases. 

As referred to herein, n n" is an integer of at 
least 4, for example at least 10 or 100, e.g. between 25 
and 2 0 0. 

Preferably, as mentioned above, the fragments are 

3 0 generated by the use of single stranded oligonucleotides 

to generate appropriate double stranded molecules. 

Of particular interest in such methods is the 
production of fragment chains that may be used to store 
information in the form of code which may readily be 
35 accessed . 

There is currently a great need for storing 
information for different purposes (e.g. computer 
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software, music, films, databases etc.) . it has 
therefore been imperative to find efficient storage 
media, resulting in the development of CD ROMs, DVD 
technology etc. Nucleic acid molecules offer far more 
5 efficient methods for storing information and have 

several advantages over storage methods currently in 
use. For example, the storage capacity of nucleic acid 
molecules is vast. In principle, a test-tube containing 
DNA molecules may contain as much information as several 

10 million CD ROMs or more. Nucleic acid may be copied 

quickly and efficiently using natural systems which are 
greatly enhanced by techniques which have been developed 
such as PCR, LCR etc. When stored appropriately, 
nucleic acid molecules may be preserved for extremely 

15 lengthy periods. Naturally existing tools for 

manipulation of nucleic molecules are already available 
for processing of the molecules, e.g. polymerases, 
restriction enzymes, transcription factors, ribosomes 
etc. The nucleic acid molecules may also have catalytic 

20 properties. 

Furthermore, nucleic acid molecules may be used as 
secure systems since they may be made such that they are 
not readily copied, unlike copying of current storage 
systems, e.g. CDs etc., which is increasingly prevalent. 

25 Previously however, it was not possible to take 

advantage of the enormous potential offered by nucleic 
acid molecules due to the absence of any effective 
methods for writing DNA messages or reading DNA 
messages. The above described method provides methods 

3 0 which overcome this problem allowing the rapid synthesis 
of large DNA molecules and methods of rapidly and 
efficiently scanning those molecules to retrieve the 
information . 

The key to effective retrieve of information 

35 encoded by the nucleic acid molecules produced according 
to the method described herein, is the expansion of the 
information providing unit in the molecule. In nature 
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and in methods used previously, each base in the 
sequence has an individual informational content. 
Indeed methods have been described in which a single 
base may signify more than a single informational unit, 
e.g in binary code, the bases A="00", C=" 01" , G="10" and 
T=" 11". Whilst this has advantages insofar as 
significant amounts of information can be contained in a 
single molecule, the system has serious drawbacks as it 
requires writing and reading methods in which individual 
bases may be attached and discriminated. 

In a preferred method of the invention therefore, 
information units are provided which are not single 
bases, but are instead short sequences. The techniques 
described above allow the rapid production of such 
chains and the information may be readily accessed. 

Thus units representing coded information may be 
generated and read. Each information unit may therefore 
represent an element of code, in which the code may for 
example be alphanumeric code or a simpler representation 
such as binary code. In each case it is necessary for 
individual elements of the code, e.g. "a", "b" , "c", 
"1", "0" etc. to be represented by an individualized and 
specific sequence . 

As used herein "information units" refer to 
discrete short sequences which represent a single piece 
of information, e.g. one or more (ie. combinations 
thereof) elements of a code . 

"Elements" of code, as mentioned above, refer to 
the different members making up a code such as binary or 
alphanumeric code . 

Thus, in a preferred embodiment of the method of 
the invention, the fragments which are linked together 
comprise regions representing a unit of information 
corresponding to one or more code elements. Preferably 
the code is alphanumeric. Especially preferably the 
code is binary. Thus for example, considering a binary 
system of information capture, if one wishes to produce 



WO 01/00816 



PCT/GB00/02512 



- 33 - 

chains consisting of "0", »l» fragments, appropriate 
sequence combinations may be attributed to "0" or " 1 11 , 

Conveniently each of said one or more code elements 
(together) has the formula 

(X) a , 
wherein 

X is a nucleotide A, T, G, C or a derivative 
thereof which allows complementary binding and may be 
the same or different at each position, and 

a is an integer greater than 2, e.g. greater than 
4, for example' from 2 to 20, preferably from 4 to 10, 
e . g . 6 to 8 , 

wherein (X) a is different for each one or more code 
elements . 

Especially preferably, in the case of binary code, 
the code elements 1 and "0" may have the formulae: 

"0"- (X). a and "1"= (Y) b/ 
wherein 

(X) a and (Y) b are not identical, 

X and Y are each a nucleotide A, T, G, C or a 
derivative thereof which allows complementary binding 
and may be the same or different at each position, and 

a and b are integers greater than 2, e.g. greater 
than 4, for example from 2 to 20, preferably from 4 to 
10 , e.g. 6 to 8 . 

As referred to herein, a "derivative" which is 
capable of complementary binding refers to a nucleotide 
analog or variant which is capable of binding to a 
nucleotide present in a complementary strand, and 
includes in particular naturally occurring or synthetic 
variants of nucleotides, e.g uracil or methylated, 
amidated nucleotides etc. 

In its simplest and preferred form, X and Y are the 
same at each position, e.g. "0"= GGGGGGGG and 
" 1 " =AAAAAAAA . However, repeat sequences such as [AC] 6 A 
or [GT] 6 A may be used. The code sequence may also have a 
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functional property, e.g. it may be an integration 
element such as AttPl or AttP2 . 

It will however be appreciated that the sequences 
described above may also denote more than a single code 
5 element. Thus for example the information unit may 

denote 2 or more code elements, e.g. from 2 to 32 
element, preferably from 2 to 4 code elements. If for 
example binary code is considered, each information unit 
may refer to "01" or "00" or "11" or "10". 

10 In the method described herein, chains comprising 

such features may be prepared as follows. To produce a 
chain with for example 8 0/1 fragments, eight "0" 
starting fragments with different overhangs and 8 "1" 
starting fragments with different overhangs are 

15 generated as illustrated in Figure 2. In this case "0" 
fragments consist of the sequence GGGGGGGG, although 
this could be replaced by other sequences. In addition 
the fragments are synthesized such that they have unique 
overhangs such that they may only be ligated at one 

20 position. Thus, the fragments for position 1 in the 

chain are produced such that they have an. overhang which 
is complemented by one of the overhangs in the fragments 
for position 2. Thus, the position 2 fragments are 
synthesized such that they can bind to position 1 

25 fragments. Similarly position 3 fragments may only bind 
to position 2 fragments at one of their termini and 
position 4 fragments at the other terminus and so forth. 
These fragments are stored separately. In order to build 
up a chain, selection is made from one of the two 

30 alternative for each position such that an appropriate 
binary chain is produced. 

Thus, in the scheme outlined above, to produce a 
fragment chain which represents a chain 01001011, "0" 
fragments from positions 1, 3, 4 and 6 are mixed with 

3 5 "1" fragments from positions 2, 5, 7 and 8. If the 

fragments are then ligated together by adding ligase or 
using other ligation methods mentioned previously, the 
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above described chain will be produced. As will be 
appreciated, this chain could also be achieved using for 
example only 4 fragments if the information unit carried 
on each fragment denoted 2 code elements. 

It is furthermore possible to combine intermediate 
fragment chains (e.g. containing at least 4 fragments) 
with other fragment chains, which providing appropriate 
overhangs exist at their termini may be ligated together 
to form composite fragment chains. Thus, several cycles 
could be conducted in parallel and the products 
combined- In the method shown in Figure 2, the end 
fragments have blunt ends, but clearly, appropriate 
fragments could be used that similarly have overhangs at 
the termini . 

An appropriate technique for producing 8 fragment 
chains, each containing 8 fragments which can then be 
ligated together is illustrated in Figure 3 . For 
fragment chain 1, end fragments are used such that it is 
possible for the completed fragment chain to ligate to 
fragment chain 2 and so on. These may then be combined 
to produce a 64 fragment chain. Similarly, 8 such 
fragment chains may be combined to produce fragment 
chains comprising 512 fragments. 

As will be appreciated, as with the production of 
shorter chains, the step of ligation, when performed, is 
conveniently effected once all the fragment chains have 
been combined. However, the step of ligation may be 
performed sequentially if desired on addition of each 
subsequent fragment chain. 

To combine 8 binary fragments per cycle, 16 
different starting fragments are required, representing 
the different n 0", "1" alternatives at each position. 
To make a chain of 64 fragments using two cycles, ie . to 
produce 8 chains with 8 fragments which are then 
ligated, only 16+ (4x7) =44 starting fragments are 
required. Thus, the number of different starting 
fragments required reflects an almost linear increase in 
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contrast to the combinations of the fragment chains 
which can be produced which increases exponentially with 
the number of cycles. As a consequence, very long 
fragment chains may be produced with a relatively small 
5 number of starting fragments. 

Of course, as mentioned previously, intermediate 
chains longer or shorter than 8 may be produced. Since 
a large number of permutations exist in the overhang 
region, more starting fragments may be used thus 

10 allowing larger fragments to be built up in a single 

cycle. Thus, the number of cycles necessary to produce 
long chains may be reduced. 

Small fragment chains produced according to the 
methods described herein may also be attached together 

15 by using variations of the techniques described herein. 
For example, complementary primer pairs may be used to 
link the various chains as described in Example 8 . In 
this technique, amplification of the fragment chains is 
achieved using different primer pairs. The second 

20 primer in primer pair 1 is complementary to the first 
primer in primer pair 2 and the second primer in that 
pair is complementary to the first primer in primer pair 
3 and so on. PCR reactions are then performed which 
produce products which in single stranded form are able 

25 to bind to one another through their complementary ends 
introduced by the primer pairs . These may then be 
ligated together. 

Alternatively, fragment chains prepared by the 
methods described herein may be amplified with a primer 

30 which contains a restriction site to a nuclease which 
cleaves outside its recognition site. These 
amplification products are then digested with that 
nuclease to produce non-palindromic overhangs in the end 
of each fragment chain. By appropriate sequence 

35 selection (e.g. in the primer or fragments which are 
used) the overhangs which are generated allow the 
different fragment chains to be combined in order. 
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In a preferred aspect therefore, the invention 
provides a method of synthesizing a double stranded 
nucleic acid molecule comprising at least the steps of : 

1) generating fragment chains according to the method 
described hereinbefore ; 

2) optionally generating single stranded regions at 
the end of said fragment chains, wherein said single 
stranded regions are complementary to other single 
stranded regions on said fragment chains thus forming 
complementary pairs of single stranded regions; 

3) contacting said fragment chains with one another, 
simultaneously or consecutively, to effect binding of 
said complementary pairs of single stranded regions. 

Optionally said chains are ligated together, 
however, alternative techniques may be use to form the 
ultimate chain, e.g. PCR may be used as described 
herein . 

Preferably intermediate fragment chains are between 

4 and 20 fragments in length, e.g. 5 to 10, and between 

5 and 50 such fragment chains are combined e.g. between 
1 0 and 2 0 . 

Conveniently fragments to be used in the method of 
the invention are contained within libraries. Methods 
of producing the fragments which make up the library are 
well known in the art. For example a series of 
oligonucleotides may be produced which comprise two 
portions. A first portion which will form an overhang 
at one end and a second portion which will effect 
binding to a complementary oligonucleotide and which 
contains within that portion the information unit. By 
producing common hybridizing portions and variant 
overhangs, a series of double stranded oligonucleotides 
for one or more code elements (denoted by at least a 
part of the hybridizing portion) are created. This 
provides a library for one (or a combination of) code 
elements. Different libraries may be created for 
different code elements (or combinations thereof) , by 
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appropriate alteration of the information unit, ie . the 
sequence in the hybridizing portion. 

Conveniently for use in the invention, these 
different double stranded oligonucleotides are arranged 
5 in 2 dimensional arrays such that in one dimension 

consecutive positions within the ultimate fragment are 
indicated and in the second dimension the possible code 
element (or combinations thereof) are provided. In the 
simplest case, in binary code, in which " 0 " and " 1 " are 

10 represented by different sequences, the first dimension 

would comprise fragments for each position of the 
proposed fragment and the second dimension would have 
only 2 variants ("0" and "1") . This may be viewed as a 
single library or two libraries, ie . the 0 or 11 1 " 

15 libraries. Once these libraries are produced, fragment 

chains with any desired order of fragments may be 
readily produced. 

In order to appropriately direct library members to 
their correct site or well (ie. the library may be 

20 comprised of separate solid supports, or a solid support 

with different addresses, e.g. wells, or different wells 
containing different solutions) , any appropriate sorting 
technique may be used. This sorting may be achieved by 
virtue of the process used for production of the library 

25 members, or sorting may be achieved by an appropriate 
technique, e.g. by binding to complementary 
oligonucleotides at the relevant library site. 

Appropriate solid supports suitable for attaching 
library members are well known in the art and widely 

30 described in the literature and generally speaking, the 

solid support may be any of the well-known supports or 
matrices which are currently widely used or proposed for 
immobilization, separation etc. in chemical or 
biochemical procedures. Thus for example, the 

35 immobilizing moieties may take the form of beads, 

particles, sheets, gels, filters, membranes, microfibre 
strips, tubes or plates, fibres or capillaries, made for 
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example of a polymeric material e.g. agarose, cellulose, 
alginate, teflon, latex or polystyrene. Particulate 
materials, e.g. beads, are generally preferred. 
Conveniently, the immobilizing moiety may comprise 
5 magnetic particles, such as superparamagnetic particles. 

In a preferred embodiment, plates or sheets are 
used to allow fixation of molecules in linear 
arrangement. The plates may also comprise walls 
perpendicular to the plate on which molecules may be 

10 attached. Attachment to the solid support may be 

performed directly or indirectly and the technique which 
is used will depend on whether the molecule to be 
attached is an oligonucleotide for fixing the library 
member or the library member itself . For attaching the 

15 library members directly, ie . not via binding to an 

oligonucleotide, conveniently attachment may be 
performed indirectly by the use of an attachment moiety 
carried on the nucleic acid molecules and/or solid 
support. Thus for example, a pair of affinity binding 

20 partners may be used, such as avidin, streptavidin or 

biotin, DNA or DNA binding protein (e.g. either the lac 
I repressor protein or the lac operator sequence to 
which it binds) , antibodies (which may be mono- or 
polyclonal) , antibody fragments or the epitopes or 

25 haptens of antibodies. In these cases, one partner of 
the binding pair is attached to (or is inherently part 
of) the solid support and the other partner is attached 
to (or is inherently part of) the nucleic acid 
molecules. Alternatively, techniques of direct 

3 0 attachment may be used such as for example if a filter 
is used, attachment may be performed by UV- induced 
crosslinking. When attaching DNA fragments, the natural 
propensity of DNA to adhere to glass may also be used. 
Oligonucleotides to be used for capture of the 

35 library members may be attached to the solid support via 

the use of appropriate functional groups on the solid 
support . 



:i lift a «0 : P i v; i:;v- "»;,■•' ' * m : ?: 



WO 01/00816 PCT/GB00/02512 

- 40 - 

Attachment of appropriate functional groups to the 
solid support may be performed by methods, well known in 
the art, which include for example, attachment through 
hydroxyl , carboxyl , aldehyde or amino groups which may 
5 be provided by treating the solid support to provide 
suitable surf ace coat ings . Attachment of appropriate 
functional groups to the nucleic acid molecules of the 
invention may be performed by ligation or introduced 
during synthesis or amplification, for example using 

10 primers carrying an appropriate moiety, such as biotin 
or a particular sequence for capture. 

In a further aspect therefore the present invention 
provides a library of fragments as defined herein 
comprising (n) m fragments, wherein n is as defined 

15 hereinbefore and corresponds to the length of chain that 
said library may produce, and m is an integer 
corresponding to the number of possible code elements or 
combinations thereof, such that fragments corresponding 
to all possible code elements for each position in the 

20 final chain are provided. 

Portions of said libraries in one dimension, ie . 
comprising n fragments for only a single code element 
(or combinations thereof) or comprising m fragments 
representing all code elements (or combinations thereof) 

25 for a single position on the chain, form further aspects 
of the invention. 

Appropriate mixing may be achieved by automation. 
For example in the case of "0", "1" fragments, the 
correct combination of these elements is the critical 

30 step in terms of resource- and time -consumption . This 

method is described in more detail in Example 2 . In 
particular, the procedure may be miniaturised providing 
appropriate amplifying methods (such as cloning and/or 
PCR) are employed in the last step. Thus, techniques 

3 5 using technology such as sorting using flow cytometers 

may be employed as described in Figure 4C. Such sorting 
procedures are well established and are able to sort 
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approximately 5-3 0000 droplets per second for standard 
equipment, but up to 300000 droplets per second for the 
most advance cytometers . 

As mentioned previously, it is possible that each 
5 fragment may denote more than a single code element. If 

for example, each fragment denotes 5 code elements, 
using existing technology and a library of 32x100 
library components, if 3200 containers were connected to 
a sorting device illustrated in Figure 4C, it should be 
10 possible to write several thousand chains with 500 code 
elements per second. Clearly, a method which can 
generate nucleic acid sequences with such rapidity 
offers significant advantages over known methods in the 
art . 

15 The nucleic acid molecule (ie. the fragment chain) 

produced according to the above described method and the 
single stranded molecules thereof comprise further 
features of the invention. These molecules may as 
appropriate be included into a vector, as described 

20 hereinbefore. 

Once produced, the fragment chains, in double 
stranded or single stranded form, may be used in various 
applications, as described hereinafter. One application 
of particular utility is to store information. In such 

25 cases appropriate means of reading the information 

stored in those chains is required. In some 
applications, fragment chains may be appropriately 
addressed to particular sites, e.g. through binding to 
oligonucleotides carried on solid supports which are 

3 0 complementary to overhangs on one terminus of the 

fragment chains. Alternatively appropriate 
antibody/antigen, or DNA: protein recognition systems may 
be used. Thus, information stored in molecules 
addressed in this way, or in solution may then be 

35 accessed. 

Co-pending application PCT/GB99/04417 , a copy of 
which is appended hereto, describes appropriate 



WO 01/00816 



PCT/GB00/02512 



- 42 - 

techniques for addressing and reading information 
contained in nucleic acid molecules. Of particular note 
in this respect are techniques in which fluorescence of 
probes carrying fluorescent labels directed to 
5 particular sequences are detected. In such techniques, 

probes, carrying labels as described hereinbefore, may 
be directed to particular fragment regions, particularly 
to regions denoting code elements. The signals 
generated (directly or indirectly) by those labels may 

10 then be detected and the code element thereby 

identified. If a simple binary system is used only 2 
discrete labels are required and their pattern of 
binding may be determined. Alternatively, if a more 
complex code is reflected in the fragment chains, 

15 correspondingly more discrete labels are required for 

unambiguous detection . 

Thus in a further aspect, the present invention 
provides, a method of identifying the code elements 
contained in a nucleic acid molecule prepared as 

20 described hereinbefore (ie. fragment chain) wherein a 

probe, carrying a signalling means (e.g. a label), 
specific to one or more code elements, is bound to said 
nucleic acid molecule and a signal generated by said 
signalling means is detected, whereby said one or more 

25 code elements may be identified. 

Preferably said signalling means is a label as 
described hereinbefore . 

A "probe" as referred to herein refers to an 
appropriate nucleic acid molecule, e.g. made up of DNA, 

30 RNA or PNA sequences, or hybrids thereof, which is able 

to bind to the target nucleic acid molecule (which may 
be single or double stranded) through specific 
interactions, ie . is specific to particular code 
elements, e.g. through complementary binding to a 

35 particular sequence. Probes may be any convenient 

length, to allow specific binding, e.g. in the order of 
5 to 50 bases, preferably 8 to 20 bases in length. 
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A "signalling means" as used herein refers to a 
means for generating a signal directly or indirectly. A 
signal may be any physical or chemical property which 
may be detected, e.g. presence of a particular product, 
5 colour, fluorescence, radiation, magnetism, 

paramagnetism, electric charge, size, or volume. 
Preferably the label is a fluorophore whose florescence 
is detected. In such cases fluorescence scanners may be 
used for detection of the label and thereby 

10 identification of the code elements. 

A particular code element or combination of 
elements may be identified by the appearance of a 
particular signal . Clearly the position of each signal 
is crucial to determining the sequence of the code 

15 elements. As a consequence methods in which positional 

information (absolute or relative) may be obtained 
should be used. Appropriate techniques, e.g. using 
target molecules which have been attached to a solid 
support at one end, are described in co-pending 

20 application PCT/GB99/04417 . 

A number of applications exist for the fragment 
chains once produced in nano and pico- technology , inter 
alia for example by stretching of the fragment chains by 
means of a stream of liquid, electricity or other 

25 technology and using them as templates for nano and 

pico- structures . The products may also be used to label 
products which can then be screened to establish their 
identity. Alternatively, the molecules may be used to 
store information, e.g. pictures, text, music or as data 

30 storage in DNA computers. The rapid production and 

reading techniques makes such applications possible for 
the first time. 

Kits for performing the methods described above 
form a preferred aspect of the invention. Thus viewed 

3 5 from a further aspect the present invention provides a 

kit for synthesizing a double stranded nucleic acid 
molecule comprising at least n double stranded nucleic 
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acid fragments, wherein at least n-2 fragments have 
single stranded regions at both termini and 2 fragments 
have single stranded regions at at least one terminus, 
wherein (n-1) single stranded regions are complementary 
5 to (n-1) other single stranded regions, thereby 

producing (n-1) complementary pairs. Preferably in 
excess of n fragments are supplied for production of a 
chain of n fragments, such that selection of appropriate 
fragments for different positions is possible. Thus in 

10 a preferred feature said kit comprises (n) m fragments, 

wherein n is as defined hereinbefore, and m is an 
integer corresponding to the number of possible 
variations, e.g. unique sequences or code elements or 
combinations thereof, such that fragments corresponding 

15 to all possible sequences or code elements for each 

position in the final chain are provided. Preferably 
these fragments are provided in appropriate libraries 
arranged with reference to their position within the 
fragment chain and the code element (s) which they 

20 represent, such that desired fragments may be readily 

selected from the array. 

Optionally the kit may contain other appropriate 
components selected from the list including ligases, 
enzymes necessary for inactivation and activation of 

25 restriction or ligation sites, primers for amplification 

and/or appropriate enzymes, buffers and solutions. The 
use of such kits for performing the method of the 
invention form further aspects of the invention. 

3 0 The following examples are given by way of illustration 
only in which the Figures referred to are as follows : 

Figure 1 shows a schematic representation of how the 
method of the invention may be used to introduce an 
35 insert into a vector, in which the insert is cleaved 

from the first nucleic acid molecule, associated with 
adapters and ligated thereto and then ligated into the 
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vector; 

Figure 2 shows the production of a fragment chain using 
8 "0" and 1 starting fragments with different 
overhangs ; 

5 Figure 3 shows the production of a 64 fragment chain in 

which 8 chains are produced comprising 8 fragments each, 
in which the termini of chains 1 and 2, and 2 and 3 etc. 
are complementary such that they may be ligated 
together ; 

10 Figure 4 shows 3 techniques for mixing "0", 11 1" 

fragments from' a library of fragments ordered for each 
position, in which in A) appropriate fragments are 
selected by aspiration from appropriate wells, B) 
appropriate fragments are released from the library 

15 wells and C) a flow cytometer is used to direct 
appropriate droplets to the mixing chamber; 
Figure 5 shows PCR amplification of signal chain 
1-0-1-0-0 using SPG and T7 primers. Lane 1: 1 fig of 1 kb 
DNA ladder (Gibco BRL) , Lane 2: 10 /xl of PCR amplified 

20 fragment chain. DNA using SP6 and T7 primers. Lane 3: 

Same as lane 2 except for the use of SP6 and T7-Cy5 
primers ; and 

Figure 6 shows the use of primer pairs during the 
process of amplification to join together fragment 
25 chains. 
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EXAMPLE 1: CLONING OF AN INSERT INTO A VECTOR, FOR 
EXAMPLE FROM PHIX174 INTO PUC19 



A general procedure to be followed using IIS and IP 
5 enzymes to achieve cloning involves the use of a cloning 

vector which has the following characteristics: 
1) A multiple cloning site located within a gene 
(lacZ, ccdB or other) that allows the detection of 
successful insertion . 

10 2) The multiple cloning site contains two flanking 

Hgal sites that generates overhangs that differ from 
other Hgal generated overhangs elsewhere in the vector. 
The orientation of the Hgal sites ensures excision of 
its sites from the vector part during digestion. To 

15 minimize background due to undigested plasmids, several 

Hgal sites and other suitable restriction enzyme sites 
are included in the MCS . The restriction enzymes are 
chosen such that they cleave well in Hgal buffer and do 
not have other sites in the vector. 

20 

The donor plasmid is cut with the appropriate set of IIS 
and/or IP enzymes. Adapters are used to specify the 
fragment to be sub-cloned into the vector, by the use of 
appropriate single stranded regions on the adapters to 
25 the overhangs generated on the insert. This results in 

the molecule: vector - adapter 1 - insert (e.g. PhiX174 
gene) - adapter II - vector. 



This method is illustrated for insertion of a PhiX174 
30 insert into a vector, e.g. pUC19. An Hga.1 site in a 

pUC19 plasmid is chosen randomly to be our "polyl inker" 
while different genes and gene combinations from the 
PhiX174 genome is used as "inserts". 

35 Genomes are organized in PhiX174 as illustrated below 

which shows the position of genes A, B, C and E relative 
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to one another : 
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In the above, gene B is located inside gene A while gene 
C is slightly overlapping with gene A (by 3 base pairs ) . 
Gene D and K are located in the same area as gene C and 
E, but are not shown. This genome area contains 9 Bbvl 
sites as shown on the bottom row, in which the overhang 
pairs that will be generated by cutting with Bbvl are as 
follows with the base pair position indicated in 
brackets: 1-CAGC/GTCG (3798) , 2-CTGC/GACG (4215) , 3- 
ACGG/TGCC (4398), 3 -GCAT/CGTA (4677), 5-CTAT/GATA 
(5049), 6-GAGA/CTCT (158), 7-GAGC/CTCG (547), 8- 
CAAC/GTTG (624) , 9-CCAT/GGTA (892) . The parts of the 
PhiX174 genome not shown contain 5 more Bjbvl sites: 10- 
TACC/ATGG (1488) , 11-TACC/ATGG (1592) , 12-CTAC/GATG 
(1639) , 13-GCAC/CGTG (3294), 14-CTAA/GATT (3297). Of 
these only 12 give rise to non- identical overhangs 
whilst 2 result in identical overhangs. 

When Hgral is used to cleave pUC19 , 4 non- identical sites 
are cleaved, giving rise to 8 non- identical overhangs. 
These are: 1-CTGCC/GACGG (573), 2 - TTCTC/ AAGAG (1131), 3- 
CAAGG / GTTCC (1881) , 4 -AGACT/TCTGA (2459) . 

Method : 

To sub-clone gene B from Bacteriophage PhiX174 into the 
designed vector, the following protocol is used: 

1) 2/ig of PhiX174 DNA is digested with 2 U of Bbvl (NEB) 
in IX buffer 2 (NEB), water added to a volume of 20/il, 
for 1 hr at 37 C C. BJbvI is then heat inactivated at 65°C 
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for 20 minutes . 

2) 2 fig of vector (e.g. pUC19) is digested with 2 U Hga.1 
(NEB) in IX buffer 1 (NEB) , water added to a volume of 
20/il, for 1 hr at 37°C. Hga.1 is then heat inactivated at 

5 6 5 °C for 2 0 minutes . 

3) The adapters are made in separate tubes by mixing 
two and two oligonucleotides (selected to obtain the 
desired product, ie. particular genets) , in 
forward/reverse orientation) and allowing annealing. 

10 4) Gfil of the cleavage reaction of PhiX174 is mixed 
with 3/il of the cleavage reaction of the vector and 
ligated in the presence of 5-50 pmol of each adaptor, 
2-10 XJ/fil T4 DNA Ligase (NEB) , IX ligase buffer (NEB) 
and 5% Polyethylene glycol 8000, water added to a volume 

15 of 3 0/xl, at 25°C for 1 hr . 

5 ) Convent ional methods are used to transform bacteria . 

6) The colonies are then counted and some of them are 
then picked for further analysis (sequencing, and the 
like) . 

20 

Materials : 

Oligonucleotides used to address PhiX174 overhangs: 
Bbvl overhang la: 

5 ' - CGA GCG CCT CCA GTG CAG CGG AG 
25 Bbvl overhang 5a: 

5 ' - TATC GCG CCT CCA GTG CAG CGG AG 
BJbvI overhang 6b: 

5 ' - CTCT GCG CCT CCA GTG CAG CGG AG 
BJbvI overhang 6 (delC) : 
3 0 5'- CTCT CTC CGC TGC ACT GGA GGC GC 

BjbvT overhang 7a : 

5 ' - CAAC GCG CCT CCA GTG CAG CGG AG 
BJbvI overhang 9b : 

5 1 - GGTA GCG CCT CCA GTG CAG CGG AG 



35 
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Oligonucleotides used to address pUC19 overhangs: 
Cloning site la 

5 ' - AAGAG CTC CGC TGC ACT GGA GGC GC 
Cloning site lb 

5 1 - CTCTT CTC CGC TGC ACT GGA GGC GC 

Two important advantages with this recombination-method 
over the classical Cohen-Boyer method should be noted. 
The procedure is very easy to perform. It involves only 
mixing and incubation steps before transformation. No 
PCR-amplif icat'ions or gel separations are required. 
The methods gives significant flexibility and allows 
complex recombinations to be made even with only two 
restriction enzymes . 

EXAMPLE 2 : AUTOMATION AND MINIATURISATION OF CHAIN 
SYNTHESIS 

This method describes a rapid process for mixing 
appropriate "0" and "1" fragments with the correct 
overhangs to produce a particular string consisting of 
"0 ,M s and 1 ■ s . 

Two libraries are produced, one with "0" fragments and 
one with "l n fragments. As mentioned in the 
description, these are generated with overhangs that can 
be ligated to corresponding overhangs for fragments at 
adjacent positions. These separate members are present 
in separate wells to form the library, such that 
position 1 fragments are present in well 1, position 2 
fragments are present in well 2, and so forth. The two 
libraries thus provide the alternatives for each 
position. In order to generate the chain therefore it 
is only necessary to select the correct fragment "0" or 
"1" for position 1, and then position 2 etc. Since 
these fragments, as a consequence of their unique 
overhangs, may only hybridize to fragments for adjacent 
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positions, it is necessary only to select the correct 
fragments, then mix and ligate those fragments 
simultaneously. Different ways of achieving this effect 
are shown in Figure 4 which shows three different 
5 alternatives for mixing. 

In Figure 4A, e.g. to produce the chain 0-1-0-0-1, the 
apparatus is used to aspirate from the " 0 " library at 
positions 1, 3 and 4, and aspirate from the "1" library 

10 at position 2 and 5. The liquids that have been 

aspirated may then be mixed together with ligase and an 
appropriate buffer. In alternative B, each well in the 
library is connected with a tube/nozzle that may be 
closed/opened electronically . Liquid from the nozzles 

15 is directed into the ligation chamber together with 

ligase and an appropriate buffer. Different chains may 
be constructed by appropriately changing the pattern of 
nozzles which are opened/closed. 

20 The procedure may also be miniaturised, e.g. using flow 
cytometry technology as illustrated in Figure 4C. In 
this method, library components are stored in containers 
on top of the "writing-machine" . Droplets from each 
container are then guided either to the waste or 

25 production well depending on the nature of the chain 

that is to be constructed. The guiding mechanism is as 
used in ordinary flow cytometers, ie . the droplets are 
charged when they leave the container and may be guided 
electronically in different directions. 

30 

RXAMPLE 3 - LIBRARIES COMPRISING OLIGONUCL EOTIDES FOR 
USE IN THE INVENTION 



35 



Conveniently, the cloning method may be performed using 
libraries containing oligonucleotides. For example a 
library may contain: 
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1. Oligonucleotides with a common portion and 5 
at the 5' end which vary to provide all possible 
permutations, ie . 1024 variants. 

2 . Oligonucleotides with a common portion and 4 
5 at the 5 1 end which vary to provide all possible 

permutations, ie . 256 variants. 

3 . Oligonucleotides with a common portion and 5 
at the 3 1 end which vary to provide all possible 
permutations, ie . 1024 variants. 
10 4 . Oligonucleotides with a common portion and 6 

at the 3 1 end which vary to provide all possible 
permutations, ie. 4096 variants. 

In the above, the oligonucleotides are produced such 
15 that all " 1" oligonucleotides are complementary to "2" 

oligonucleotides by virtue of the invariant bases, ie . 
to generate a double stranded molecule with variant 4/5 
base overhangs. Similarly "3" and M 4 ,f oligonucleotides 
are complementary . 

20 

Oligonucleotides combined in this way (ie. with 
overhangs at either end of 4-6 bases may also be 
combined together with complementary double stranded 
oligonucleotides also generated by combining certain 
25 members of the library. In this way variable overhangs 

of different lengths may be created in the resultant 
molecule, e.g. a molecule with a 4 base overhang at both 
the 3." and 5 1 end. 

3 0 Oligonucleotides may also be provided in the library 

which allow 5' and 3' adapters to be linked. Thus for 
example oligonucleotides having the following form may 
be provided: 

5. 5 1 -AAAA- [ compl ] -FFFFF- 3 1 
35 6. 5 ' -DDDDD- [ compl ] -FFFFF- 3 1 

7 . 5 1 - AAAA- [ compl ] -HHHHHH-3 • 
8 . 5 1 -DDDDD- [ compl ] -HHHHHH-3 * 



bases 



bases 



bases 



bases 



WO 01/00816 



PCT/GB00/02512 



- 52 - 



9 . 


3 ' 


- [ compl* ] - 5 ' 


10 . 


5 ' 


-BBBB - [ comp2 . ] -3 ' 


11 . 


5 ' 


- EEEEE - [ comp2* ] -3 


12 . 


5 ' 


- [ comp3 ] -GGGGG-3 ' 


13 . 


5 1 


- [ comp3* ]-IIIIII-3 



in which "compx" refer to a region which is 
complementary to region "compx*", ie. "5", "6", "7" or 
"8" can bind to "9". Furthermore, "comp2" can bind to 

10 oligonucleotide 1 above , "comp2a" can bind to 

oligonucleotide 2, " comp3 " can bind to oligonucleotide 
"4" and "comp3*" can bind to oligonucleotide "3" . The 
bases denoted "A" bind to " B " , ' ie. "7" and "10" can bind 
at their ends. Similarly "D" binds to "E", "F" binds to 

15 "G" and "H" binds to "I". (These bases when together 

may have a variable content, e.g. AAAA = GAGA and then 
BBBB=TCTC . ) 

By appropriate use of the linkers described above, 5' 
20 and 3' adapters may be combined. For example, 

oligonucleotide "2" with a particular 4 base 5* overhang 
may be bound through its complementary region to an 
oligonucleotide linker "11" which will then leave a 
"EEEEE" overlap. This may be bound to oligonucleotide 
25 "8" through the overlap which may itself bind 

oligonucleotide "9" through its complementary region. 
The overlap "HHHHHH" may be bound to oligonucleotide 
"13" which may attach an oligonucleotide "4" through 
binding to the complementary region. Thus various 
3 0 permutations may be made which result in various overlap 
lengths, e.g. any combination of 4, 5, or 6 base 
overlaps which may on the same or different strands . 

EXAMPLE 4 - TRIMMING PROCEDURE FOR GENERATING UNIQUE 
35 OVERHANGS 



The system presented here makes it possible to perform a 
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trimming procedure with seven different IIS enzymes that 
make 5' 4 base overhangs (Fokl and Bst71I), 5' 5 base 
overhangs (Hgral), 3' 5 base overhangs (BplI and Bael) 
and 3' 6 base overhangs (Cjel and HaelV) . If the 
5 oligonucleotide system presented here is combined with 
the basic oligonucleotide kit described in Example 3, 
all permutations of 3 ' 5 base and 6 base overhangs and 
all permutations of 5 1 4 base and 5 base overhangs can 
be addressed for the trimming procedure. 

0 

In this Example, the location of the binding motifs of 
the initiation linkers is shown below: 



Fokl GGATG 

15 Bstlll --GCAGC • 

Hgal GACGC 

BplI GAG CTC- 

Bael CYATG CA 

Cjel CCA GT 

20 HaelV ----GAY RTC 

Consensus - - GCAGCGAC CATGAGTCCA - CTC - -GTGGATGACGC 

Initiation linkers : 





X=0 : 


5 1 


- - GCAGCGAC CATGAGTCCA - 


CTC- 


- GTGGATGP P P P P P 


25 




3 1 


- - CGTCGCTGGTACTCAGGT - 


GAG- 


-CACCTAC 




X=l ; 


5 ' 


- - GCAGCGAC CATGAGTCCA - 


CTC- 


-GTGGATG-PPPPPP 






3 ■ 


- - CGTCGCTGGTACTCAGGT - 


GAG- 


- CACCTAC - 




X=2 : 


5 ! 


- - G C AG CG AC CAT GAGT C C A - 


CTC- 


-GTGGATG- - PPPPPP 






3 1 


- - CGTCGCTGGTACTCAGGT - 


GAG- 


-CACCTAC- - 


30 


X=3 : 


5 1 


- - G C AG CGAC C ATGAGT C C A - 


CTC- 


-GTGGATG PPPPPP 






3 ' 


- - CGTCGCTGGTACTCAGGT - 


-GAG- 


-CACCTAC 




X=4 : 


5 ' 


- - GCAG CGAC CATGAGTCCA - 


-CTC- 


- GTGGATGACG C P P P P P P 






3 ' 


- - CGTCGCTGGTACTCAGGT - 


-GAG- 


- CACCTACTGCG 




X = 5 : 


5 * 


- - GCAGCGACCATGAGTCCA- 


-CTC- 


- GTGGATGACGC -PPPPPP 


35 




3 ■ 


- - CGTCGCTGGTACTCAGGT - 


-GAG- 


-CACCTACTGCG- 



X= 6 : 5 - - GCAG CG AC C ATGAGT C CA - CTC - - GTGGATGACGC - - PPPPPP 
3 1 - - CGTCGCTGGTACTCAGGT - GAG - - CACCTACTGCG - - 
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7 : 


5 ' 


- -GCAGCGACCATGAGTCCA- 


CTC- 


- GTGGATGACGC - - 


- -PPPPPP 






3 ' 


- - CGTCGCTGGTACTCAGGT - 


GAG- 


- CACCTACTGCG - - 




x= 


8 : 


5 ' 


- - GCAGCGACCATGAGTCCA- 


CTC- 


-GTGGATGACGC- - 


---PPPPPP 






3 ' 


- -CGTCGCTGGTACTCAGGT- 


GAG- 


-CACCTACTGCG- - 




x= 


9 : 


5 1 


- - GCAGCGACCATGAGTCCA - 


CTC- 


-GTGGATGACGC- - 


PPPPPP 






3 ' 


- -CGTCGCTGGTACTCAGGT - 


GAG- 


-CACCTACTGCG - 





The 6 base 3 ' overhang PPPPPP is a non-palindromic 
sequence that can be ligated with the complementary 

10 overhang QQQQQQ . The reason 10 different initiation 

linkers are needed is because Bael cuts 10 bases away 
from its binding site. These linkers therefore allow a 
trimming procedure where Bael "jumps" -10 bases for each 
trimming cycle. 10 different start positions will then 

15 be necessary to cover all possibilities. On the other 

side, Hgral cuts only 5 bases away, only necessitating 5 
different start positions. This is the reason the 
binding site for Hga.1 is not present on X=0 - X=3, 
above . 

20 

Propagation linkers: 

Fokl : 5 1 GGATG 

31 CCTACNNNN 

Bstlll : 5' GCAGC 

25 3« CGTCGNNNN 

Hgral: 5' GACGC 

3» CTGCGNNNNN 

BplI : 5 ' GAG CTCNNNNN 

31 CTC GAG 

3 0 Bael: 5 1 CCATG CAMSnSTISIN 

3 1 _ GGTAC GT 

HaelV: 5 ' GAC : - -GTCNNNNNN 

31 CTG CTG 

Cjel: 5' CCA GTNNNNNN 

35 3» GGT CA 
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Termination linkers : 

The adapters made with the basic oligonucleotides 
described earlier can be used as termination linkers. 
There is therefore no need for a separate set of 
termination linkers . 



Method : 

In this method a trimming reaction using Bstlll that 
will begin on a 3' 5 base overhang is shown. The target 
10 DNA is shown below in which the first overhang that will 
be generated is marked 11 * M . 



* * * * f 

3 1 CACTT * * * * 

15 

The first Bstlll overhang in the target DNA will be 
located 5-8 bases downstream of the overhang CACTT-3 1 . X 
must therefore be 3 (see the figure below) . The 
following strategy can then be applied: 

20 

One linker is prepared that can address the 3 ' GTGAA 
overhang by annealing 4-3 1 6 bases (QQQQQQ) with 3-3' 5 
bases (GTGAA) in one tube: 

2 5 GTGAA -3' 

3'- QQQQQQ - 

The 3 1 -GAGTGC overhang is then ligated with the X=3 
initiation linker and the GTGAA- 3 ' overhang is ligated 
30 with the CACTT-3' overhang on the target DNA molecule: 



5 1 - - GCAGCGACCATGAGTCCA- CTC - - GTGGATG PPPPPP 

3 ' - - CGTCGCTGGTACTCAGGT - GAG - - CACCTAC QQQQQQ 



35 



GTGAA 
CACTT 
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EXAMPLE 5 - REMOVAL OF INTERVENING SEQUENCES FROM 
CONSTRUCTS 

In some instances, constructs may be prepared which 
5 contain undesirable nucleic acid sequences between, e.g 
the insert sequence and the vector sequence. Strategies 
for removing the linker sequences should then be 
applied. Illustrated below are some possible strategies 
in which binding sites for restriction enzymes are 
10 provided in the adapter sequences. Cleavage with the 

restriction enzymes will then result in DNA ends that 
can be religated* The vector DNA is marked as . .VWVWV 
while insert DNA is marked as IIIIIII. 

15 Method 1 

Two IIS enzymes that generate 5' -4 base overhangs (Bbsl 
and Esp3 I ) : 

. . VVVVWVVGAGC- GAGACG GAAGAC - - GAGC I IIIIIIIII 

2 0 WWWWCTCG-CTCTGC CTTCTG- - CTCGI IIIIIIIII. . 

After cleavage with Bbsl and Esp31: 

. . VWVWW + GAGC - GAGACG GAAGAC - - + 

2 5 VWWWVCTCG - CTCTGC CTTCTG - - CTCG 

GAGCIIIIIIIIII 

IIIIIIIIII . . 

30 After ligation with T4 DNA ligase: 

GAGC - GAGACG GAAGAC - + 

-CTCTGC CTTCTG- CTCG 



35 



. VWWVWGAGCIIIIIIIIII 
WWWWCTCGIIIIIIIIII . 
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Method 2 

One IIS enzyme that generates two 3 1 3 base 
overhangs (BsaXI) : 

5 

. .WWWWGAG AC CTCC- GAGI 1 1 1 1 1 1 1 1 1 

WWWWCTC TG GAGG CTCI 1 1 1 1 1 1 1 1 1 . . 

After cleavage with BsaXI : 

10 

..WWWWGAG + AC CTCC -GAG 

WWWW CTC TG GAGG 

+ IIIIIIIIII 
15 CTCIIIIIIIIII - . 

After ligation with T4 DNA ligase: 

AC CTCC GAG + 

20 CTC -TG GAGG 

. .WWWWGAGIIIIIIIIII 
WWWWCTCIIIIIIIIII. . 

2 5 Method 3 

One IIS enzyme that generates blunt ends {MlyD : 

. .WWWW GAGTC IIIIIIIIII 

WWWW --CTGAG IIIIIIIIII - - 



30 



35 



After cleavage with Mlyl : 

..WWWW + - GAGTC - 

WWWW CTGAG 



IIIIIIIIII 
IIIIIIIIII . 
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After ligation with T4 DNA ligase : 



GAGTC- 

• CTGAG- 



. . VWWWVIIIIIIIIII 
WWWWIIIIIIIIII. . 

EXAMPLE 6 - IDENTIFYING OLIGONUCLEOTIDE SETS WITH 6 BASE 
10 PAIR OVERHANGS WITH MINIMAL MIS -MATCH LIGATIONS 

In order to identify oligonucleotide sets with 6 base 
pair overhangs which are unlikely to form mis -match 
ligations with one another the following steps may be 
15 taken. 

1. Create all 2048 overhang pairs of 6 bases. 

2. Remove the 32 palindromic pairs. 

20 This produces a final set of 2016 overhang pairs. 

PART 1 

1. Take a pair as pair #1 and select the next pair by 
executing section 1 . 

25 

Section 1 
Algori thm 1 

Compute the (2016 - n) tables of unweighted mismatch 
scores between the already chosen n pair(s) and all 

30 (2016 -n) remaining pairs , and find among the latter 

the pair(s) for which the lowest score in the table is 
the highest (see below for details about score 
computation) . If there is only one such pair, then 
select it. If there are several pairs, then compute the 

3 5 weighted mismatch scores of the overhang comparisons 

that gave the lowest unweighted score and find the 
pair(s) for which the lowest weighted score is the 
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highest. If there is only one such pair, then select 
it. If there are several pairs, then redo the whole 
procedure using the second 1 owest unweighted score in 
the mismatch table, then the third lowest, and so on. 
5 If several pairs remain tied after all mismatch scores 
have been considered, keep them all. 

Repeat algorithm 1 for each selected pair and iterate it 
over the desired number of positions to obtain the 

10 chain (s) of overhang pairs. This procedure generates a 
tree with an overhang pair on each branch. The lowest 
unweighted and weighted mismatch scores of the 
particular combination of pairs at each point are 
computed. A particular pathway is stopped (1) when the 

15 desired number of positions is reached, or (2) when the 
combination of pairs is one that has already been found 
earlier, or (3) when the lowest mismatch scores of that 
combination are lower than the lowest scores of the 
complete chain (s) already constructed. Point (3) ensures 

20 that each new complete chain always has lowest mismatch 
scores that are higher than or at least equal to those 
of the previously constructed chain (s) . Note also that, 
as a result of this process, all pairs in a given chain 
are unique and all complete chains in the tree are 

25 unique. The whole process terminates when the last 
pathway to be explored stops. Keep the complete 
chain (s) whose lowest mismatch scores are the highest. 

Repeat section 1 starting with each of the 2016 pairs as 
30 pair #1 to produce a set of 2016 overhang chains. Find 
the best chain (s) by applying algorithm 2 

Algorithm 2 

For all chains, compute the tables of unweighted 
3 5 mismatch scores between all the pairs that are present 

in the chain, and find the chain (s) for which the lowest 
score in the table is the highest (see below for 
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details) . If there is only one such chain, then select 
it. If there are several chains, then compute the 
weighted mismatch scores of the overhang comparisons 
that gave the lowest unweighted score and find the 
5 chain (s) for which the lowest weighted score is the 

highest. If there is only one such chain, then select 
it. If there are several chains, then redo the whole 
procedure using the second lowest unweighted score in 
the mismatch table, then the third lowest, and so on. 
10 If several chains remain tied after all mismatch scores 

have been considered, then keep all of them. 

This allows the production of a set of one or more 
overhang chains . 

15 

PART 2 

Take a chain and execute section 2. 

Section 2 

2 0 Algorithm 3 

For that chain, find the overhang pair (s) that is (are) 
responsible for the lowest unweighted and weighted 
scores in the table of mismatch scores between all pairs 
in the chain. Then, create new chains by substituting 
25 that pair with all remaining overhang pairs that are not 

present in the original chain (if there are several 
pairs to be substituted, substitute one pair at a time) . 
From the complete set of newly generated chains and the 
original chain, select one or more chains following 

3 0 algorithm 2. Here, including the original chain into 

algorithm 2 ensures that the selected chains always have 
a mismatch score that is higher than or at least equal 
to the score of the original chain. The improvement (if 
any) may involve the lowest or nth lowest unweighted 
35 score, or the corresponding weighted score. 



Repeat algorithm 3 for each selected chain. This 
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procedure generates a tree with a chain on each branch. 
Each new chain which is added to the tree has a mismatch 
score higher than or equal to the score of the chain 
found in the previous step. A particular pathway is 
5 stopped when the selected chain is one that has already 

been found earlier. This ensures that all chains in the 
tree are unique. The whole process terminates when the, 
last pathway to be explored stops. Keep all the chains 
that are present in the tree. 

10 

Repeat section 2 (i.e., construct a tree) starting with 
each of the chains selected at the end of part 1. 

From the whole set of chains present in all trees, 
15 select one or more chains following algorithm 2 . 

This produces a final set of one or more overhang 
chains . 

2 0 COMPUTATION OF MISMATCH SCORES 

Unweighted score 

The unweighted score for a ligation between two 6 -base 
overhangs is the number of mismatches observed, 
25 considering the triplets of the first 3 and the last 3 
bases separately. For example, the score for the 
ligation AAAAAC / TTTG CA is 0-3 and the score for 
AAAAAC/TCAGGG is 2-2. All possible scores are ranked 
from highest to lowest according to the order below: 

30 

highest : : 3-3 

3-2/2-3 

2- 2 

3- 1/1-3 

35 2-1/1-2 

1-1 

3-0/0-3 
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2-0/0-2 

lowest : : 1-0/0-1 
Weighted score 

5 The weighted score (WS) for a ligation is computed as 

follows : 

6 

WS = 6-^T BPS, 
i = l 

10 where BPS 2 is the score for the particular base pair at 

site i and is given in the table below: 



AA = 


1 . 


0 


CA = 


0 


. 6 


GA = 


1 . 


0 


TA - 


0 


. 0 


AC - 


0 . 


6 


CC = 


1 


. 0 


GC = 


0 . 


0 


TC - 


0 


. 6 


AG = 


1 . 


0 


CG = 


0 


. 0 


GG = 


0 . 


9 


TG = 


0 


. 2 


AT = 


0 . 


0 


CT = 


0 


. 6 


GT = 


0 . 


2 


TT - 


0 


. 6 



For the perfect match between an overhang and its 
complement, WS = 6. 

20 

COMPARISON AMONG PAIRS AND CONSTRUCTION OF TABLES OF 
SCORES 

Finding the next overhang pair 

25 

To select the next overhang pair, tables of mismatch 
scores between the pairs selected at previous positions 
and all remaining pairs are computed. To construct such 
a table, all previously selected pairs are compared with 
3 0 the new pair and also every overhang is compared with 

itself. Thus, if n pairs have already been selected, the 
number of ligations considered for each table is 4n + 
2(n+l) = 6n + 2 . When comparing two overhangs that are on 
the same DNA strand, one of them is reversed. 

35 

Let us consider the following example where pairs 
AAAAAC / TTTTTG (1A/1B) and AAACGT / TTTGCA (2A/2B) have 
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been chosen previously and the new pair AGTCCC/TCAGGG 
(3A/3B) is tried at the next position: 

The corresponding table is : 

5 



Comparison 


Overhang 


Ligation 


Unweighted 
Score 


Weighted 
Score 


1 vs 1 


1A 
1A 


AAAAAC 
CAAAAA 


3-3 


0 . 8 




IB 
IB 


TTTTTG 
GTTTTT 


3-3 


3 . 2 


2 vs 2 


2A 
2A 


AAACGT 
TGCAAA 


2-2 


2 . 8 




2B 
2B 


TTTGCA 
ACGTTT 


2-2 


4 . 4 


3 vs 3 


3A 
3A 


AGTCCC 
CCCTGA 


2-2 


3 . 6 




3B 
3B 


TCAGGG 
GGGACT 


2-2 


3 . G 


1. vs 3 


1A 
3A 


AAAAAC 
CCCTGA 


3-2 


2 . 6 




1A 

3B 


AAAAAC 
TCAGGG 


2-2 


2.4 




IB 
3A 


TTTTTG 
AGTCCC 


2-2 


4 . 0 




IB 
3B 


TTTTTG 
GGGACT 


3-2 


4 . 6 


2 vs 3 


2A 
3A 


AAACGT 
CCCTGA 


3-2 


2 . 7 




2A 
3B 


AAACGT 
TCAGGG 


2-2 


3 . 3 




2B 
3A 


TTTGCA 
AGTCCC 


2-2 


3 . 6 
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2B 


TTTGCA 


3-2 


3 . 4 




3B 


GGGACT 







Here, the lowest score is 2-2; 2.4 given by the ligation 
5 between overhangs 1A and 3B . 

Score table for a chain 

To compute the table of mismatch scores for a chain, all 
10 overhang pairs contained in the chain are compared with 
each other and also every overhang is compared with 
itself . Thus , for a chain of p overhang pairs , the 
number of ligations considered is 4p(p-l)/2 + 2p = 
2(p2) . As above, one of the two overhangs is reversed 
15 in the comparison when both are on the same DNA strand. 

For example, let us consider the following 3 -pair (i.e., 
4 -position) chain: AAAAAC / TTTTTG (1A/1B) , AAACGT/TTTGCA 
(2A/2B) , AGTCCC/TCAGGG (3A/3B) in which 1A is on one 
2 0 fragment, IB and 2A are on a second fragment, 2B and 3A 

are on a third fragment and 3B is on a fourth fragment. 



The corresponding table is : 



Comparison 


Overhang 


Ligation 


Unweighted 


Weighted 








Score 


Score 


1 vs 1 


1A 


AAAAAC 


3-3 


0 . 8 




1A 


CAAAAA 








IB 


TTTTTG 


3-3 


3 . 2 




IB 


GTTTTT 






2 vs 2 


2A 


AAACGT 


2-2 


2 . 8 




2A 


TGCAAA 








2B 


TTTGCA 


2-2 


4 . 4 




2B 


ACGTTT 
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3 vs 3 


3A 
3A 


AGTCCC 
CCCTGA 


2-2 


3 . 6 




3B 
3B 


TCAGGG 
GGGACT 


2-2 


3 . 6 


1 vs 2 


1A 

2A 


AAAAAC 
TGCAAA 


2-3 


1 . 8 




1A 
2B 


AAAAAC 
TTTGCA 


0-3 


3 . 8 




IB 
2A 


TTTTTG 
AAACGT 


0-3 


5 . 0 




IB 
2B 


TTTTTG 
ACGTTT 


2-3 


3 . 8 


1 vs 3 


1A 
3A 


AAAAAC 
CCCTGA 


3-2 


2 . 6 




1A 
3B 


AAAAAC 
TCAGGG 


2-2 


2 . 4 




IB 
3A 


TTTTTG 
AGTCCC 


2-2 


4 . 0 




IB 
3B 


TTTTTG 
GGGACT 


3-2 


4 . 6 


2 vs 3 


2A 
3A 


AAACGT 
CCCTGA 


3-2 


2 . 7 




2A 
3B 


AAACGT 
TCAGGG 


2-2 


3 . 3 




2B 
3A 


TTTGCA 
AGTCCC 


2-2 


3 . 6 




2B 
3B 


TTTGCA 
GGGACT 


3-2 


3 . 4 



20 



Here, the lowest score is 0-3; 3.8 given by the ligation 
between overhangs 1A and 2B. 
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Results obtained: 

Table of breaking points 



PART 1 



# of 

positions 


Unweighted 
score 


Weighted 
score 


# of equal 
chains 


3 


3-3 


1 . 6 


48 


4 


2-2 


4 . 0 


48 


9 


2-2 


2 . 5 


12 


10 


3-1 


3 . 2 


12 


14 


3-1 


2 . 4 


6 


15 


2-1 


4 . 6 


6 


33 


2-1 


3 . 0 


12 


34 


3-0 


4 . 6 


12 


90 


3-0 


3 . 1 





PART 2 



# of 

positions 


Unweighted 
score 


Weighted 
score 


# of equal 
chains 


3 


3-3 


1 . 6 


48 


4 


3-2 


2.2 


48 


9 


2-2 


2.5 


12 


10 


3-1 


3 . 2 


12 


14 


3-1 


2.4 


6 


15 


3-1 


2 . 0 


6 


33 


2-1 


3 . 0 


12 


34 


3-0 


4 . 6 


12 


90 









It will be noted that the unweighted mis-match score (in 
which (9 - 3-3, 8 - 3-2, 7 = 2-2, 6 = 3-1, 5 = 2-1, 4 = 
1-1, 3 = 3-0, 2 = 2-0, 1 = 1-0) reduces as the number of 
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positions increases . 

Samples of chains obtained at the end of part 1 and ^ 
the end of part 2 

5 

3 positions (this chain is obtained at the end of both 
parts) : 

AACTCG/TTGAGC 
TCTCAC/AGAGTG 

10 

4 positions : 
part 1 

AATTGG/TTAACC 
TGCCAC/AGGGTG 
1 5 ATAGTC / TATCAG 
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5 



part 2 

AATGGG/TTACCC 
TCGGAC / AGCCTG 
TTAACG/ AATTGC 



9 positions (this chain is obtained at the end of both 
parts) : 

AATCAC / TTAGTG TACACG/ ATGTGC AGGCTG / T C CGAC 

TGAGGG / ACTCCC ACATTC / TGTAAG TTTAGC/ AAATCG 
10 TCGGAT / AGCCTA • GGCTAG/ CCGATC 



10 positions (this chain is obtained at the end of both 
parts) : 

AAAACC/TTTTGG AGGCTC/TCCGAG T C GAT A / AG C TAT 

15 TTGGGG / AACCCC GTCATG/ CAGTAC ATTCAG/TAAGTC 

TCATAG / AGTATC TGCAGT/ ACGTCA AGAGAT/TCTCTA 

14 positions (this chain is obtained at the end of both 
parts) : 

ACGTGC / TGCACG GTTGGC / CAACCG TCAGCC / AGTCGG 

2 0 TAT GAG / AT ACT C TTGCGG/AACGCC AGAGGG/TCTCCC 

TGCACG /ACGTGC AGTAT C / TCATAG CACCGC / GTGGCG 
ATACAC / TATGTG TGACTA/ ACTGAT 
AACTTG / TTGAAC ACTCCG/ TGAGGC 



25 15 positions: 

part 1 

AAAACC / TTTTGG 
TTGGGG /AACCCC 
TCATAG/ AGTATC 
3 0 AGGCTC/TCCGAG 
GTCATG / CAGTAC 



TGCAGT / ACGTCA 
TCGATA/ AGCTAT 
ATTCAG/TAAGTC 
AGAGAT / TCTCTA 
TACTTC / ATGAAG 



AAGTAA/ TTCATT 
CCGTCC/GGCAGG 
TGTAAC / ACATTG 
ACCGTG / TGGCAC 
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part 2 

AAAACC/TTTTGG 
TTGGGG/AACCCC 
TCATAG/AGTATC 
5 AGGCTC/TCCGAG 
GACAAG / CTGTTG 
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TCTGCT/ AGACGA 
TCGATA/ AGCTAT 
ATTCAG / TAAGTC 
AG AGAT / T CT CTA 
TACTTC / ATGAAG 



AAGTAA / TTCATT 
CCGTCC/GGCAGG 
TGTAAC / ACATTG 
ACCGTG/TGGCAC 



33 positions (this chain is obtained at the end of both 
parts) : 

10 AACTAG/TTGATC ' GTAAGG/ CATTCC TCGCCT/AGCGGA 

TGGAGC/ ACCTCG AAACTA / TTTGAT TCTCGG/AGAGCC 

TCAAAT / AGTTTA GTCTCC / CAGAGG ACCCCC / TGGGGG 

CAGGCC/ GTCCGG ACAGCG/TGTCGC TTTTCG/AAAAGC 

TAT C AC/ ATAGTG CACATC/ GTGTAG AAGTCA/ TTCAGT 

15 AGATTC /TCTAAG TGTGTA/ACACAT GTT'CTC/ CAAGAG 

TTCCGT/ AAGGCA TAATGC/ ATTACG 

CCCACG / GGGTGC GGTAAG / CCATTC 

ATGCCG/TACGGC AGTTAT / TCAATA 

TCCGTC/ AGGCAG CAACAG/ GTTGTC 

2 0 CCACGC/ GGTGCG ATCGGC/TAGCCG 

ACTATG / TGATAC AATGCT / TTACGA 

TTAGCA/ AATCGT TTGGAG/AACCTC 



34 positions (this 

25 parts) : 

AACTCT/TTGAGA 
TCGAAC / AGCTTG 
CAGGGC/ GTCCCG 
TAAAGG/ ATTTCC 

3 0 TGTGCG/ACACGC 
ATGTAG / TACATC 
TTCCCC/ AAGGGG 
AATCTC/TTAGAG 
TGGCGT / ACCGCA 

3 5 GGCTGC/CCGACG 



chain is obtained at the end of both 



TTATTC / AATAAG 
CACAAG / GTGTT C 
TCCGAT/AGGCTA 
AGTAGC / TCATCG 
CCGTCG/GGCAGC 
TCACTA/ AGTGAT 
GTGACG / CACTGC 
TGAAAT / ACTTTA 
AGCATG/ TCGTAC 
ACCGTC / TGGCAG 



C CAAT C / GGTT AG 
ACTTAT / TGAATA 
AAAGAG / TTTCTC 
TTGATA / AACTAT 
AAGACC / TTCTGG 
CAATCC / GTTAGG 
TCTCGC/AGAGCG 
AGGGGG/TCCCCC 
TGCCAG/ ACGGTC 
TACTAC / ATGATG 
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TTTGAC / AAACTG 
ACACCG/TGTGGC 
TGAGGC/ACTCCG 



5 



90 positions (this chain is obtained at the end of part 
1) : 





AAAAAA / TTTTTT 


TCTGGC/AGACCG 


AAACGG/TTTGCC 




CCGGCC / GGCCGG 


ACGCAG/TGCGTC 


TTTGCC / AAACGG 


10 


AGGTAG / T C CAT C 


TGCGTC/ACGCAG 


AACCAA/ TTGGTT 




TCCATC / AGGTAG 


AGTCAT / TCAGTA 


CAAAAC / GTTTTG 




AT CTG C / T AG ACG 


TCAGTA/ AGTCAT 


AAGGAA/TTCCTT 




TAGACG/ ATCTGC 


CAGCCG/GTCGGC 


CGCCGC/ GCGGCG 




ACTGTG / TGACAC 


GTCGGC / CAGCCG 


AGTGCG/TCACGC 


15 


TGACAC/ ACTGTG 


AATTTC / TTAAAG 


TCACGC/AGTGCG 




CAT T AC / GTAATG 


TTAAAG / AATTTC 


ATTTTA/ TAAAAT 




ACCCCA/ TGGGGT 


CCAACG/GGTTGC 


ATCCTA/ TAGGAT 




ATGGTA/ T AC CAT 


GGTTGC / CCAACG 


AGT AT C / T CAT AG 




CGAAGC/ GCTTCG 


CACCAC/ GTGGTG 


T CAT AG/ AGTATC 


20 


ATTACC / TAATGG 


AG AAT A / T C TT AT 


ATGTGG / TACACC 




TAATGG / ATTACC 


TCTTAT / AGAATA 


TACACC/ATGTGG 




CTCCTC/GAGGAG 


AT C AAT / TAGTTA 


ATGCAC / TACGTG 




AGTTGA/ TCAACT 


T AG TT A / AT C AAT 


TACGTG / ATGCAC 




AATGCT / TTACGA 


ACTTCA/TGAAGT 


ACTAAC / TGATTG 


25 


TTACGA/ AATGCT 


AGCCCC/TCGGGG 


TGATTG / ACTAAC 




AAGCGC / TTCGCG 


TCGGGG/AGCCCC 


CAGTGC/ GTCACG 




TTCGCG/AAGCGC 


ACCATG / TGGTAC 


GTCACG/CAGTGC 




CCCAAG/ GGGTTC 


TGGTAC / ACCATG 


AATAAG / TTATTC 




GGGTTC/ CCCAAG 


AGGGGA/TCCCCT 


TTATTC / AATAAG 


30 


ACATCC / TGTAGG 


CTAATC/ GATTAG 


AGATAT / TCTATA 




TGTAGG/ ACATCC 


CGAGAG / GCTCTC 


TCTATA/ AG AT AT 




AACTTG / TTGAAC 


GCTCTC/CGAGAG 


AAGTCG/TTCAGC 




TTGAAC / AACTTG 


ACACGT / TGTGCA 


TTCAGC / AAGTCG 




ATAGAC / TATCTG 


TGTGCA/ ACACGT 


AATCGA/ TTAGCT 


35 


TAT CTG / ATAGAC 


CCTGTC/GGACAG 


TTAGCT / AATCGA 




AGACCG/TCTGGC 


GGACAG/CCTGTC 


AGGCTC/TCCGAG 
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TCCGAG/AGGCTC 
CGGGGC/GCCCCG 



5 EXAMPLE 7 - CONSTRUCTION OF A 5 - FRAGMENT CHAIN ENCODING 

THE BINARY SEQUENCE 1-0-1-0-0 

This experiment demonstrates the construction of a 
specific 5 fragment chain using a set of four 
10 non-palindromic 5' 6 base overhang pairs. The set of 

four unique overhang pairs was found using a computer 
program as described in Example 6 . 

Based upon the overhang pairs, a set of five library 
15 components was made by annealing complementary 

oligonucleotides in separate tubes: 
signal 1 : 

5 1 - TAATACGACTCACTATACCACAAGTTTGTACAAAAAAGCAGGCTCTATTC - 3 1 
and 5 ' - TAGGAAGAATAGAGCCTGCTTTTTTGTACAAACTTGTGGTATAGTGA 
2 0 GTCGTATTA- 3 1 ; 

signal 2 : 

5 ' - TTCCTATGCAGTGGACCACTTTGTACAAGAAAGCTGGGTTGCAGT - 3 ' and 
5 1 -GCAACTACTGCAACCCAGCTTTCTTGTACAAAGTGGTCCACTGCA- 3 ' ; 
signal 3 : 

2 5 5 ' - AGTTGCTTGACGCCACAAGTTTGTACAAAAAAGCAGGCTTTGACG - 3 ' and 

5 « - CGACATCGTCAAAGCCTGCTTTTTTGTACAAACTTGTGGCGTCAA- 3 ' ; 
signal 4 : 

5 1 - ATGTCGAAGGGCGGACCACTTTGTACAAGAAAGCTGGGTAAGGGC - 3 f and 
5 1 - GACAGGGCCCTTACCCAGCTTTCTTGTACAAAGTGGTCCGCCCTT - 3 ! ; 

3 0 signal 5 : 

5 1 -CCTGTCATGTGGACCACTTTGTACAAGAAAGCTGGGTTTCTATAGTGTCACCT 
AAATC-3 1 and 5 * -GATTTAGGTGACACTATAGAAACCCAGCTTTCTTGTACAA 
AGTGGTCCACAT - 3 1 ; 

T7 : 5 • - TAATACGACTCACTATACCA - 3 ' 
3 5 T7-Cy5 primer: 5 1 - TAATACGACTCACTATA - 3 ' 

SP6 primer: 3 1 -AAGATATCACAGTGGATTTAG- 5 ' 
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The library components (4 pmol each) were then mixed 
together and ligated using 100 U T4 DNA ligase (NEB) in 
IX ligase buffer at 25°C for 15 minutes. The ligase was 
then inactivated at 65°C for 20 min. 

5 

Sfxl of the ligation reaction (50/iD was used as template 
in a PCR reaction (50^1) containing IX Thermopol buffer 
(NEB) , 0.05 mM dNTPs , 0.4 fiM T7 primer, 0.4 /xM SP6 
primer and 0.04 U//xl Vent polymerase (NEB) . The PCR was 

10 hot started (95°C for 3 minutes before addition of 

polymerase) and cycled 30 times; 95°C, 30 sec; 55°C, 30 
sec; 76°C, 30 sec, using a PTC-200 thermo cycler (MJ 
Research) . 10 /xl of the PCR was analysed on a 1.5% 
agarose gel as shown in Figure 5 . The gel picture showed 

15 only one intense band corresponding to approximately 24 0 

bp as expected (243 bp) . The remaining PCR product was 
extracted twice with chloroform and precipitated using 
71% ethanol and 0 . 1M NaAc. The DNA was dissolved in 
water and sequenced. The sequence confirmed that the 

2 0 expected signal chain (1-0-1-0-0) was generated. 

EXAMPLE 8 - CONSTRUCTION OF A 5X5 FRAGMENT CHAIN 
ENCODING THE BINARY SEQUENCE USING ONE LIGATION CYCLE 
FOLLOWED BY ONE PCT CYCLE OR BY TWO LIGATION CYCLES 

25 

This experiment demonstrates the use of complementary 
primer pairs to link fragment chains together as an 
alternative to the ligation strategy demonstrated in the 
previous example . 

30 

In this experiment 5 fragments chains with 5 positions 
(fragments or bits) each are ligated separately in 
ligation cycle 1 as demonstrated earlier (Example 7) . 
The 5 fragment chains are then amplified with 5 
35 different primer pairs (pair 1 is used to amplify chain 

1 , pair 2 is used to amplify chain 2 , etc ) . The second 
primer in primer pair 1 is complementary to the first 
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primer in prime pair 2, the second primer in primer pair 
2 is complementary to the first primer in primer pair 3, 
and so on. 



A small aliquot is then taken from each of the 5 PCR 

reactions and a new PCR reactions is performed with 

primers that are specific to the end of signal chain 1 
and 5. The method is illustrated in Figure 6. 

Materials : 

Oligonucleotides are selected which bind to the fragment 
chain and also serve as primers. Thus for example, for 
adjacent chains may be bound using for example the 
following primer pairs: 

fragment chain 2 terminal (with bound primer) : 
TTCTATAGTGTCACCTAAATC 

AAGA TA TCACAGTGGATTTAGCCTA CCAGTACA TCCAACGGCAACT ' 

fragment chain 3 terminal (with bound primer) : 
GTCATGTAGGTTGCCGTTGATCCATCCTAATACGACTCACTATAGCA 

ATTATGCTGAGTGATATCGT 



The above exemplified primer regions are complementary 
and may thus be bound together. 

As an alternative to this method, two ligation cycles 
may be used in which 5 fragment chains (generated by 
ligation), are ligated together. Thus, several 
construction cycles to build up long signal chains. 
After the initial ligation in the first ligation cycle 
the 5 fragment chains are then amplified with primers 
containing a Fokl site. The primers are appropriately 
selected such that digestion with FoicI will then make 
non-palindromic overhangs in the end of each fragment 
chain in which the overhang generated in fragment chain 
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1 is able to ligate with the first overhang generated in 
fragment chain 2, the second overhang generated in 
fragment chain 2 is able to ligate with the first 
overhang generated in fragment chain 3 , and so on . The 5 
5 fragment chains can thereby be ligated together in a 

controlled manner to generate a final chain with 25 
fragments (bits) . 

If we want to construct fragment chains with 100 or 500 
10 fragments we can repeat this procedure 1 or 2 more 

times. The polymerase capacity will, however, be a 
limiting factor regarding how many ligation cycles it is 
possible to perform. Other strategies will therefore 
need to be employed to construct even longer chains. 

15 

EXAMPLE 9: CLONING OF AN INSERT FROM PHIX174 INTO PUC1 
WITH A TRIMMED GENE A 

This experiment demonstrates the "trimming" strategy for 
20 elimination of unwanted flanking sequences. Another 

important aspect of this experiment is that we 
demonstrate that it is possible to link a 5' and 3' 
overhang together with a single stranded oligonucleotide 
alone. It should also be noted that the inserts are 
25 cloned into two different IIS sites, thereby eliminating 

the problem with insert concatemerisat ion . 

In this method, Gene A from PhiX174 is cloned into a 
pUC- 19 vector . PhiX174 is prepared by cleavage with 

3 0 Bbvl , resulting in 15 fragments flanked by different 
non-palindromic 5' 4 bases overhangs, as described in 
more detail in Example 1. The two overhangs adjacent to 
Gene A is then addressed with "initiation linkers" 
containing a BplI site, while the rest of the fragments 

35 is allowed to religate. T4 DNA ligase, BplI , a 

"propagation linker" containing a BplI site, and two 
" termination adaptors " addressed to the first and last 
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five bases of Gene A respectively are used. The 
solution is incubated at 37°C thereby allowing the 
trimming reaction to succeed until terminated when the 
five first and last bases in Gene A are reached. 

5 

The pUC-19 vector is prepared by cleavage with Hga.1 and 
Bsal . The overhang generated by Hgal cleavage are 
described in Example 1. Cleavage with Bsal results in 4 
non- identical cleavages giving rise to 8 non- ident ical 
10 overhangs, e.g. site 1- GCCA/CGGT (1600). 

Gene A has the following sequence at its first and last 
five bases (marked by underlining) . 

15 . . . GCTGGAGGCCTCCACTATGAAATCGCGTAGAG . . . 

. . . CGACCTCCGGAGGTGATACTTTAGCGCATC 

CTGGCGGA AAATG A GAAAATT CGAC CTA . . . 

. . . ACGACCGCCTTTTACTCTTTTAAGCTGG 

20 

When terminating the trimming procedure at the 
underlined sequences it is possible to clone Gene A 
without any unwanted flanking base pairs. The 3' 5 base 
overhangs generated by BplI correspond to the marked 
25 base pairs . 

The overhang pair generated by Hgral and Bsal in pUC19 
that is used as a cloning site for the gene A from 
PhiX174 is TTCTC/CGGT . 

30 

Method : 

This is as described in Example 1 except that PUC19 is 
cut with both Hgral (NEB 4, 37°C) and thereafter with 
Bsal (NEB 4, 50°C) 



35 
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Materials : 

Initiation linker 1 (s) : 

5 ' ATT CGG TCG AGA TGC TCT CA3 ' 

5 

Initiator linker 1 (as) : 

5 ' CGA CTG AGA GCA TCT CGA CCG AAT3 1 

Initiation linker 2 (s) : 
10 5'GCG TTA CTG AGC GTA GCT CTG3 ' 

Inititator linker 2 (as) : 

5 ' CTC TCA GAG CTA CGC TCA GTA ACG C3 ' 

15 Propagation linker (s) : 

5 1 TGC TGC AGG AGC GAA TCT CNN NNN3 ' 

Propagation linker (as) : 

5 1 GAG ATT CGC TCC TGC AGC A3 ' 

20 

Labeling linker 2 (s) 

5 ' CTC TTG CTA TAG TGA GTC GTA TTA3 ' 

Labeling linker 2 (as) : 
2 5 5 1 TAA TAC GAC TCA CTA TAG CA3 1 

Termination linker 1 (s) : 

5'AAG AGC TCA GGT CAT TGA CGT AGC TAT GAA3 1 

30 Termination linker 1/2 (as) : 

5 ' AGC TAC GTC AAT GAC CTG AG 3 1 

Termination linker 1 (short version) : 
5 1 AAG AGA TGA A3 ' 

35 

Termination linker 2 (s) : 

5'ACC GCT CAG GTC ATT GAC GTA GCT TCA TT3 1 
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Termination linker 2 (short version) : 
5 ' ACC GTC ATT 3 ' 

The efficiency of the trimming reaction may be accessed 
5 as follows. Overhang 6) is addressed with a y- 32 P 

labelled adaptor. The trimming reaction is then allowed 
to start from overhang 1) . Aliquots are taken out at 
regularly time intervals and the size distribution of 
the DNA fragments is then analysed on gel . 
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Claims: 



1- A method of synthesizing a double stranded nucleic 
acid molecule comprising at least the steps of : 
5 1) generating n double stranded nucleic acid fragments, 
wherein at least n-2 fragments have single stranded 
regions at both termini and 2 fragments have single 
stranded regions at at least one terminus, wherein (n-l) 
single stranded regions are complementary to (n-l) other 
10 single stranded regions , thereby producing (n-l) 
complementary pairs, 

2) contacting said n double stranded nucleic acid 
fragments, simultaneously or consecutively, to effect 
binding of said complementary pairs of single stranded 

15 regions, and 

3) optionally ligating said complementary pairs 
simultaneously or consecutively to produce a nucleic 
acid molecule consisting of n fragments, 

wherein said fragment comprises a region representing a 
2 0 unit of information corresponding to one or more code 
elements and said code is alphanumeric. 



2 . A method of synthesizing a double stranded nucleic 
acid molecule comprising at least the steps of: 

25 l) generating n double stranded nucleic acid fragments, 
wherein at least n-2 fragments have single stranded 
regions at both termini and 2 fragments have single 
stranded regions at at least one terminus, wherein (n-l) 
single stranded regions are complementary to (n-l) other 

30 single stranded regions, thereby producing (n-l) 
complementary pairs, 

2) contacting said n double stranded nucleic acid 
fragments, simultaneously or consecutively, to effect 
binding of said complementary pairs of single stranded 

35 regions, and 

3) optionally ligating said complementary pairs 
simultaneously or consecutively to produce a nucleic 
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acid molecule consisting of n fragments, 

wherein said fragment comprises a region representing a 
unit of information corresponding to one or more code 
elements and said code is binary. 

5 

3. A method of synthesizing a double stranded nucleic 
acid molecule comprising at least the steps of: 

1) generating n double stranded nucleic acid fragments, 
10 wherein at least n-2 fragments have single stranded 

regions at both termini and 2 fragments have single 
stranded regions at at least one terminus , wherein (n-1) 
single stranded regions are complementary to (n-1) other 
single stranded regions, thereby producing (n-1) 
15 complementary pairs, 

2) contacting said n double stranded nucleic acid 
fragments, simultaneously or consecutively, to effect 
binding of said complementary pairs of single stranded 
regions, and 

20 3) optionally ligating said complementary pairs 

simultaneously or consecutively to produce a nucleic 
acid molecule consisting of n fragments, 

wherein said fragment comprises a region representing a 
unit of information corresponding to one or more code 
25 elements and each of said one or more code elements has 
the formula 

CX) a , 
wherein 

X is a nucleotide A, T, G, C or a derivative 
30 thereof which allows complementary binding and may be 
the same or different at each position, and 

a is an integer from 4 to 10, 
wherein (X) a is different for each one or more code 
elements, 

35 

4. A method as claimed in claim 3 wherein said code is 
alphanumeric . 
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5. A method as claimed in claim 3 wherein said code is 
binary, 

6. A method as claimed in claim 5, wherein said code 
is binary and the code elements "1" and n 0 M have the 
formulae : 

»0" = (x) a and »i" = (Y) b , 
wherein 

(X) a and (Y)fc are not identical/ 

X and Y are each a nucleotide A, T, G, C or a 
derivative thereof which allows complementary binding 
and may be the same or different at each position, and 

a and b are integers from 4 to 10. 

7. A method as claimed in claim € wherein in the 
formulae (X) a and (Y) b , X and Y are the same at each 
position. 

8-. A method as claimed in any one of claims 1 to 7 
wherein said fragments are each between 8 and 25 bases 
in length. 

3 . A method as claimed in any one of claims 1 to 8 
wherein n is at least 10 . 

10 . A method of synthesizing a double stranded nucleic 
acid molecule comprising at least the steps of: 

1) generating fragment chains according to the method 
defined in any one of claims 1 to 9; 

2) optionally generating single stranded regions at 
the end of said fragment chains, wherein said single 
stranded regions are complementary to the single 
stranded regions on said fragment chains thus forming 
complementary pairB of single stranded regions; 

3) contacting said fragment chains with one another, 
simultaneously or consecutively, to effect binding of 
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said complementary pairs of single stranded regions. 

11. A nucleic acid molecule produced according to a 
method aa defined in any one of claims 1 to 10, or a 
single stranded nucleic acid molecule thereof, 

12. A method of identifying the code elements contained 
in a nucleic acid molecule prepared according to a 
method as defined in any one of claims 1 to 10, wherein 
a probe, carrying a signalling means, specific to one or 
more code elements , is bound to said nucleic acid 
molecule and a signal generated by said signalling means 
is detected, whereby said one or more code elements may 
be identified. 

13 . A library of fragments as defined in any one of 
claims 1 to 12, comprising (n) m fragments, wherein n is 
as defined in any one of claims 1 to 12 and corresponds 
to the length of chain that said library may produce, 
and m is an integer corresponding to the number of 
possible code elements or combinations thereof, such 
that fragments corresponding to all possible code 
elements for each position in the final chain are 
provided. 

14 . A kit for synthesizing a double stranded nucleic 
acid molecule comprising a library as defined in claim 
13 and a ligase - 
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<151> 2000-06-20 

<150> NO 20003190 
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<210> 
<211> 
<212> 



1 

11 
DNA 



<213> Artificial Sequence 
<220> 

<223> Adapter 
<220> 

<221> misc_feature 

<222> (8) . . (9) 

<223> N is any nucleotide, 



<400> ~ 1 
ggcccccnna a 



11 



<210> 2 

<211> 11 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Adapter 



<220> 



1 



<221> 
<222> 
<223> 



misc_f eature 
(7) . . (9) 

N is any nucleotide . 



<400> 2 
ggggccnnnc t 



11 



<210> 3 
<211> 23 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Bbvl overhang 

r 

<400> 3 

cgagcgcctc cagtgcagcg gag 2 3 



<210> 4 

<211> 24 

<212> DNA 

<2 13> Artificial Sequence 
<220> 

<223> Bbvl overhang 

<400> 4 

tatcgcgcct ccagtgcagc ggag 

<210> 5 

<211> 24 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Bbvl overhang 

<400> 5 

ctctgcgcct ccagtgcagc ggag 



<210> 6 

<211> 24 

<212> DNA 

<2 13> Artificial Sequence 
<220> 

<223> Bbvl overhang 6 (delC) 

<400> 6 

ctctctccgc tgcactggag gcgc 
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<210> 7 

<211> 24 

<212> DNA 

<213> Artificial Sequence 



<220> 

<2 2 3> Bbvl overhang 7a 
<400> 7 

caacgcgcct ccagtgcagc ggag 24 



<210> 8 

<211> 24 

<212> DNA 

<213> Artificial Sequence 
<220> 

<22 3> Bbvl overhang 9b 

<400> 8 

ggtagcgcct ccagtgcagc ggag 24 



<210> 9 

<211> 25 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Cloning site la 

<400> 9 

aagagctccg ctgcactgga ggcgc 25 



<210> 
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<211> 


25 


<212> 


DNA 


<213> 


Artificial Sequence 


<220> 




<223> 


Cloning site lb 


<400> 


10 



ctcttctccg ctgcactgga ggcgc 25 



<210> 11 

<211> 35 

<212> DNA 

<213> Artificial Sequence 



3 



<220> 

<223> Consensus binding motifs of the initiation linkers 
<220> 

<221> misc_f eature 

<222> (19) . . (24) 

<223> N is any nucleotide. 

<400> 11 

gcagcgacca tgagtccanc tcnngtggat gacgc 35 

<210> 12 

<211> 37 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Initiation linker 
<220> 

<2 21> mi sc_f eature 

<222> (19) . . (37) 

<223> N is any nucleotide with the proviso that the DNA sequence from 3 

2 to 37 is not palindromic. 

<400> 12 

gcagcgacca tgagtccanc tcnngtggat gnnnnnn 37 

<210> 13 

<211> 38 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Initiation linker 
<220> 

<221> misc_feature 

<222> (19) . . (38) 

<223> N is any nucleotide with the proviso that the DNA sequence from 3 

3 to 38 is not palindromic. 

<400> 13 

gcagcgacca tgagtccanc tcnngtggat gnnnnnnn 38 

<210> 14 

<211> 39 

<212> DNA 

<213> Artificial Sequence 



4 



<220> 
<223> 



Initiation linker 



<220> 

<221> misc_feature 
<222> (19) . . (39) 

<223> N is any nucleotide with the proviso that the DNA sequence from 3 

4 to 39 is not palindromic. 

<400> 14 

gcagcgacca tgagtccanc tcnngtggat gnnnnnnnn 39 

<210> 15 

<211> 4 0 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Initiation linker 
<220> 

<221> misc_feature 

<222> (19) . . (40) 

<223> N is any nucleotide with the proviso that the DNA sequence from 3 

5 to 40 is not palindromic. 

<400> 15 

gcagcgacca tgagtccanc tcnngtggat gnnnnnnnnn 4 0 

<210> 16 

<211> 41 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Initiation linker 
<220> 

<221> misc_feature 

<222> (19) . . (41) 

<223> N is any nucleotide with the proviso that the DNA sequence from 3 

6 to 41 is not palindromic. 

<400> 16 

gcagcgacca tgagtccanc tcnngtggat gacgcnnnnn n 41 

<210> 17 
<211> 42 



5 



<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Initiation linker 
<220> 

<221> misc__f eature 
<222> (19) . . (42) 

<223> N is any nucleotide with the proviso that the DNA sequence from 3 

7 to 42 is not palindromic. 

<400> 17 

gcagcgacca tgagtccanc tcnngtggat gacgcnnnnn nn 42 

<210> 18 

<211> 43 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Initiation linker 
<220> 

<221> misc_feature 

<222> (19) . . (43) 

<223> N is any nucleotide with the proviso that the DNA sequence from 3 

8 to 43 is not palindromic. 

<400> 18 

gcagcgacca tgagtccanc tcnngtggat gacgcnnnnn nnn 4 3 

<210> 19 

<211> 44 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Initiation linker 
<220> 

<221> v misc_feature 
<222> (19) . . (44) 

<223> N is any nucleotide with the proviso that the DNA sequence from 3 

9 to 44 is not palindromic. 

<400> 19 

gcagcgacca tgagtccanc tcnngtggat gacgcnnnnn nnnn 44 
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<210> 20 

<211> 45 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Initiation linker 

<220> 

<221> misc_f eature 

<222> (19) . . (45) 

<223> N is any nucleotide with the proviso that the DNA sequence from 4 
0 to 45 is not palindromic. 



<400> 20 

gcagcgacca tgagtccanc tcnngtggat gacgcnnnnn nnnnn 45 

<210> 21 

<211> 46 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Initiation linker 

<220> 

<2 21> mi sc_f eature 

<222> (19) . . (46) 

<223> N is any nucleotide with the proviso that the DNA sequence from 4 
1 to 46 is not palindromic. 

<400> 21 

gcagcgacca tgagtccanc tcnngtggat gacgcnnnnn nnnnnn 4 6 



<210> 22 

<211> 50 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Synthetic oligonucleotide 

<400> 22 

taatacgact cactatacca caagtttgta caaaaaagca ggctctattc 50 



<210> 23 

<211> 56 

<212> DNA 

<213> Artificial Sequence 
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<220> 

<223> Synthetic oligonucleotide 



<400> 23 

taggaagaat agagcctgct tttttgtaca aacttgtggt atagtgagtc gtatta 



56 



<210> 24 

<211> 45 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Synthetic oligonucleotide 

<400> 24 

ttcctatgca gtggaccact ttgtacaaga aagctgggtt gcagt 45 



<210> 


25 


<211> 


45 


<212> 


DNA 


<213> 


Artificial Sequence 


<220> 




<223> 


Synthetic oligonucleotide 


<400> 


25 



gcaactactg caacccagct ttcttgtaca aagtggtcca ctgca 45 



<210> 


26 


<211> 


45 


<212> 


DNA 


<213> 


Artificial Sequence 


<220> 




<223> 


Synthetic oligonucleotide 


<400> 


26 



agttgcttga cgccacaagt ttgtacaaaa aagcaggctt tgacg 45 



<210> 


27 


<211> 


45 


<212> 


DNA 


<213> 


Artificial Sequence 


<220> 




<223> 


Synthetic oligonucleotide 


<400> 


27 


cgacatcgtc aaagcctgct tttttgtaca 



45 
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<210> 


28 


<211> 


45 


<212> 


DNA 


<213> 


Artificial Sequence 


<220> 




<223> 


Synthetic oligonucleotide 


<400> 


28 



atgtcgaagg gcggaccact ttgtacaaga aagctgggta agggc 45 



<210> 


29 


<211> 


45 


<212> 


DNA 


<213> 


Artificial Sequence 


<220> 




<223> 


Synthetic oligonucleotide 


<400> 


29 



gacagggccc ttacccagct ttcttgtaca aagtggtccg ccctt 45 



<210> 


30 


<211> 


58 


<212> 


DNA 


<213> 


Artificial Sequence 


<220> 




<223> 


Synthetic oligonucleotide 


<400> 


30 


cctgtcatgt ggaccacttt gtacaagaaa 


<210> 


31 


<211> 


52 


<212> 


DNA 


<213> 


Artificial Sequence 


<220> 




<223> 


Synthetic oligonucleotide 


<400> 


31 


gatttaggtg acactataga aacccagctt 


<210> 


32 


<211> 


20 


<212> 


DNA 


<213> 


Artificial Sequence 


<220> 




<223> 


Synthetic oligonucleotide 



58 



52 
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<400> 32 

taatacgact cactatacca 



20 



<210> 33 

<211> 17 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Synthetic oligonucleotide 

<400> 33 

taatacgact cactata 17 



<210> 34 

<211> 21 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Synthetic oligonucleotide 



<210> 35 

<211> 21 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Fragment chain 2 terminal 

<400> 35 

ttctatagtg tcacctaaat c 21 



<210> 36 

<211> 46 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Primer 



<400> 34 



aagatatcac agtggattta g 



21 



<400> 36 

tcaacggcaa cctacatgac catccgattt aggtgacact atagaa 



46 



<210> 
<211> 



37 
47 



10 



<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Primer 
<400> 37 

gtcatgtagg ttgccgttga tccatcctaa tacgactcac tatagca 



<210> 38 

<211> 20 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Fragment chain 3 terminal 

<400> 38 

tgctatagtg agtcgtatta 



<210> 
<211> 
<212> 
<213> 


39 
20 
DNA 

Artificial 


Sequence 


<220> 
<223> 


Initiation 


linker 1 


<400> 39 

attcggtcga gatgctctca 


<210> 
<211> 
<212> 
<213> 


40 
24 
DNA 

Artificial 


Sequence 


<220> 
<223> 


Initiation 


linker 1 


<400> 40 

cgactgagag catctcgacc gaat 


<210> 
<211> 
<212> 
<213> 


41 

21 
DNA 

Artificial 


Sequence 


<220> 
<223> 


Initiation 


linker 2 


<400> 


41 
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gcgttactga gcgtagctct g 



21 



<210> 42 

<211> 25 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Initiation linker 2 (as) 

<400> 42 

ctctcagagc tacgctcagt aacgc 25 

<210> 43 

<211> 24 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Propagation linker (s) 
<220> 

<221> misc_f eature 

<222> (20) . . (24) 

<223> N is any nucleotide. 



<210> 44 

<211> 19 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Propagation linker (as) 

<400> 44 

gagattcgct cctgcagca v 19 

<210> 45 

<211> 24 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Labeling linker 2 (s) 



<400> 43 

tgctgcagga gcgaatctcn nnnn 



24 



<400> 45 

ctcttgctat agtgagtcgt atta 



24 
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<210> 46 

<211> 20 

<212> DNA 

<213> Artificial Sequence 



<220> 
<223> 



Labeling linker 2 (as) 



<400> 46 

taatacgact cactatagca 



20 



<210> 
<211> 
<212> 



47 
30 
DNA 



<213> Artificial Sequence 
<220> 

<223> Termination linker 1 (s) 



<400> 47 

aagagctcag gtcattgacg tagctatgaa 



30 



<210> 
<211> 
<212> 



48 
20 
DNA 



<213> Artificial Sequence 
<220> 

<223> Termination linker 1/2 (as) 



<400> 


48 


agctacgtca atgacctgag 


<210> 


49 


<211> 


10 


<212> 


DNA 


<213> 


Artificial Sequence 


<220> 




<223> 


Termination linker 


<400> 


49 



20 



aagagatgaa 



10 



<210> 50 

<211> 29 

<212> DNA 

<213> Artificial Sequence 



13 



<220> 

<223> Termination linker 2 (s) 



<400> 50 

accgctcagg tcattgacgt agcttcatt 



29 



<210> 51 

<211> 11 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> 0 starting fragment, position 1 

<400> 51 

gggggggg aa a n 

<210> 52 

<211> 11 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> 0 starting fragment, position 2 



<210> 53 

<211> 12 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> 0 starting fragment, position 2 

<400> 53 

ccccccccct tt 12 



<210> 54 

<211> 10 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> 1 starting fragment, postion 2 



<400> 52 
ggggggggaa c 



11 



<400> 



54 



aaaaaaaaac 



10 
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<210> 55 

<211> 11 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> 0 starting fragment, postion 7 



<400> 55 
ggggggggcc g 



11 



<210> 
<211> 
<212> 



56 
12 
DNA 



<213> Artificial Sequence 



<220> 
<223> 



0 starting fragment, postion 7 



<400> 56 
cccccccccg eg 



12 



<210> 57 

<211> 10 

<212> DNA 

<213> Artificial Sequence 



<220> 

,<223> 1 starting fragment, postion 7 



<400> 57 
aaaaaaaccg 



10 



<210> 58 

<211> 11 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> 1 starting fragment, postion 7 



<400> 58 
ttttttttgc g 



11 



<210> 59 

<211> 12 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> 0 starting fragment, postion 8 



15 



<400> 59 
cccccccccc gg 



12 



<210> 60 

<211> 11 

<212> DNA 

<213> Artificial Sequence 
<220> 

<22 3> 1 starting fragment, post ion 8 



<400> 60 
ttttttttcg g 



11 



<210> 61 

<211> 14 

<212> DNA 

<213> Artificial Sequence 



<220> 
<223> 



Fragment 0, position 1.2 



<400> 61 
aaaggggggg gaaa 



14 



<210> 62 

<211> 13 

<212> DNA 

<213> Artificial Sequence 



<220> 
<223> 



Fragment 1, position 1.3 



<400> 62 
aacaaaaaaa aaa 



13 



<210> 63 

<211> 14 

<212> DNA 

<213> Artificial Sequence 



<220> 
<2 2 3> 



Fragment 0, position 8.1 



<400> 63 
tttccccccc cccg 



14 



<210> 
<211> 



64 
13 



16 



ton j;-i'sas s 



<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Fragment 1, position 8.1 



<400> 64 
tttttttttt teg 



13 



<210> 65 

<211> 14 

<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Fragment 0, position 8.2 

<400> 65 
gttccccccc cccg 



14 



<210> 
<211> 
<212> 
<213> 

<220> 
<223> 



66 
13 
DNA 

Artificial Sequence 



Fragment 1, position 8.2 



<400> 66 
gttttttttt teg 



13 



<210> 67 

<211> 14 

<212> DNA 

<213> Artificial Sequence 



<220> 
<223> 



Fragment 0, position 8.3 



<400> 67 
cttccccccc cccg 



14 



<210> 68 

<211> 13 

<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Fragment 1, position 8.3 
<400> 68 



17 



cttttttttt teg 



<210> 69 

<211> 31 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Initiation linker 
<220> 

<221> misc_feature 

<222> (8) . . (13) 

<223> N is any nucleotide. 

<400> 69 

catccacnng agntggactc atggtcgctg c 



<210> 70 

<211> 32 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Initiation linker 
<220> 

<221> misc_f eature 

<222> (1) . . (14) 

<223> N is any nucleotide. 



<400> 70 

ncatccacnn gagntggact catggtcget gc 



<210> 71 

<211> 33 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Initiation linker 
<220> 

<221> mi sc_f eature 

<222> (1) . . (15) 

<223> N is any nucleotide. 



<400> 71 

nncatccacn ngagntggac teatggtege tgc 



18 



.i ii"r v. "» rr , S3 i; ■? ±: 



<210> 72 

<211> 34 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Initiation linker 



<220> 
<221> 
<222> 
<223> 



misc_f eature 
(1) . . (16) 

N is any nucleotide. 



<400> 72 

nnncatccac nngagntgga ctcatggtcg ctgc 



34 



<210> 73 

<211> 35 

<212> DNA 

<213> Artificial Sequence 



<220> 
<223> 



Initiation linker 



<220> 
<221> 
<222> 
<223> 



mi sc_f eature 
(12) . . (17) 

N is any nucleotide. 



<400> 73 

gcgtcatcca cnngagntgg actcatggtc gctgc 



35 



<210> 74' 

<211> 36 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Initiation linker 



<220> 
<221> 
<222> 
<223> 



misc__feature 
(1) . . (18) 

N is any nucleotide. 



<400> 74 

ngcgtcatcc acnngagntg gactcatggt cgctgc 



36 



19 
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<2 1 0> 


75 


<2 11> 


37 


<212> 


DNA 


<213> 


Artificial Sequence 


<220> 




<223> 


Initiation linker 


<220> 




<221> 


misc feature 


<222> 


(1) - - (19) 


<223> 


N is any nucleotide 



<400> 75 

nngcgtcatc cacnngagnt ggactcatgg tcgctgc 



<210> 


76 


<211> 


38 


<212> 


DNA 


<213> 


Artificial Sequence 


<220> 




<223> 


Initiation linker 


<220> 




<221> 


misc feature 


<222> 


(1) . . (20) 


<223> 


N is any nucleotide 



<400> 76 

nnngcgtcat ccacnngagn tggactcatg gtcgctgc 



<210> 


77 


<211> 


39 


<212> 


DNA 


<213> 


Artificial Sequence 


<220> 




<223> 


Initiation linker 


<220> 




<221> 


misc feature 


<222> 


(1) - - (21) 


<223> 


N is any nucleotide 


<400> 


77 



nnnngcgtca tccacnngag ntggactcat ggtcgctgc 



20 



1 ' ! 



<210> 78 



<211> 
<212> 



40 
DNA 



<213> Artificial Sequence 
<220> 

<223> Initiation linker 
<220> 

<221> misc_feature 

<222> (1) . . (22) 

<223> N is any nucleotide. 



<400> 78 

nnnnngcgtc atccacnnga gntggactca tggtcgctgc 



40 



<210> 
<211> 
<212> 



79 
10 
DNA 



<213> Artificial Sequence 
<220> 

<223> Propagation linker Hgal 
<220> 

<221> misc_f eature 

<222> (1) . . (5) 

<223> N is any nucleotide. 



<400> 79 
nnnnngcgtc 



10 



<210> 
<211> 
<212> 



80 
32 
DNA 



<213> Artificial Sequence 
<220> 

<223> Gene A from PHIX174 



<400> 80 

gctggaggcc tccactatga aatcgcgtag ag 



32 



<210> 81 

<211> 28 

<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Gene A from PHIX174 



21 
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it SiW^ vj;;ii; g 



<400> 81 

ctggcggaaa atgagaaaat tcgaccta 



28 



<210> 82 
<211> 13 
<212>" DNA 

<213> Artificial Sequence 
<220> 

<223> Recognition motif of the N-terminal part of the hsdS subunit of S 
tyR 1241 

<220> 

<2 2 1> misc_f eature 

<222> (4) . . (9) 

<223> N is any nucleotide. 



<210> 83 

<211> 14 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Recognition motif of the C-terminal part of the hsdS subunit of S 
tyR 1241 

<220> 

<221> misc_feature 

<222> (4) . . (10) 

<223> N is any nucleotide. 



<210> 84 

<211> 13 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Recognition motif of a new enzyme made by merging the N- and C-te 
rminal parts of the hsdS subunit of StyR 1241 

<220> 

<221> misc_feature 

<222> (4) . . (9) 



<400> 82 
gaannnnnnr teg 



13 



<400> 83 
tcannnnnnn rttc 



14 



22 



<223> N is any nucleotide. 



<400> 84 
gaannnnnnr ttc 



13 



<210> 
<211> 
<212> 



85 
40 
DNA 



<213> Artificial Sequence 
<220> 

<223> Ligated initiation linker 
<220> 

<221> misc_f eature 

<222> (1) . . (22) 

<223> N is any nucleotide with the proviso that the sequence from 1 to 
6 is complemantary to the sequence from 40 to 35 of SEQ ID NO: 15 



<400> 85 

nnnnnnnnnc atccacnnga gntggactca tggtcgctgc 



40 



<210> 
<211> 
<212> 



86 
47 
DNA 



<213> Artificial Sequence 
<220> 

<223> An example of sequences that generate 5 1 -4 base overhangs by Bbsl 
and Esp3I 

<220> 

<221> misc_f eature 

<222> (1) . . (47) 

<223> N is any nucleotide. 



<400> 86 

nnnnnnnnga gcngagacgn nnnnngaaga cnngagcnnn nnnnnnn 



47 



<210> 87 

<211> 47 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An example of sequences that generate 5 ' -4 base overhangs by Bbsl 
and Esp3I 



23 



<220> 

<221> misc_f eature 

<222> (1) . . (47) 

<223> N is any nucleotide . 

<400> 87 

nnnnnnnnnn gctcnngtct tcnnnnnncg tctcngctcn nnnnnnn 4 7 

<210> 88 

<211> 29 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An example of 5' -4 base overhangs generated by Bbsl and Esp3I cl 
eavage 

<220> 

<2 21> mi sc_f eature 

<222> (5) . . (25) 

<223> N is any nucleotide. 

<400> 88 

gagcngagac gnnnnnngaa gacnngagc 2 9 

<210> 89 

<211> 25 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An example of 5 1 -4 base overhangs generated by Bbsl and Esp3I cl 
eavage 

<220> 

<221> mis c_f eature 

<222> (5) . . (25) > 

<223> N is any nucleotide. 

<400> 89 

gctcnngtct tcnnnnnncg tctcn 25 

<210> 90 

<211> 22 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An example of ligation products between 5' -4 base overhangs gene 



24 



* 



rated by Bbsl and Esp3I cleavage 



<220> 

<221> misc_f eature 

<222> (1) . . (22) 

<223> N is any nucleotide. 

<400> 90 

nnnnnnnnga gcnnnnnnnn nn 2 2 

<210> 91 

<211> 22 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An example of ligation products between 5* -4 base overhangs gene 
rated by Bbsl and Esp3I cleavage 

<220> 

<221> misc_f eature 

<222> (1) . . (22) 

<223> N is any nucleotide. 



<210> 92 

<211> 51 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An example of sequences that generate two 3' 3 base overhangs by 
BsaXI 

<220> 

<221> misc_f eature 

<222> (1) . . (51) 

<223> N is any nucleotide. 



<400> 91 

nnnnnnnnnn gctcnnnnnn nn 



22 



<400> 92 

nnnnnnnnga gnnnnnnnnn acnnnnnctc cnnnnnnnga gnnnnnnnnn n 



51 



<210> 
<211> 
<212> 
<213> 



Artificial Sequence 



93 
51 
DNA 



25 



<220> 

<223> An example of sequences that generate two 3 f 3 base overhangs by 
BsaXI 

<220> 

<221> misc_feature 

<222> (1) . . (51) 

<223> N is any nucleotide. 

<400> 93 

nnnnnnnnnn ctcnnnnnnn ggagnnnnng tnnnnnnnnn ctcnnnnnnn n 51 



<210> 


94 


<211> 


30 


<212> 


DNA 


<213> 


Artificial Sequence 


<220> 




<223> 


An example of 3 1 3 base 


<220> 




<221> 


misc feature 


<222> 


(1) • • (27) 


<223> 


N is any nucleotide. 



<400> 94 

nnnnnnnnna cnnnnnctcc nnnnnnngag 3 0 



<210> 


95 


<211> 


30 


<212> 


DNA 


<213> 


Artificial Sequence 


<220> 




<223> 


An example of 3' 3 base 


<220> 




<221> 


misc feature 


<222> 


(1) . . (27) 


<223> 


N is any nucleotide. 



<400> 95 

nnnnnnngga gnnnnngtnn nnnnnnnctc 30 

<210> 96 

<211> 44 

<212> DNA 

<213> Artificial Sequence 



26 



<220> 

<223> An example of sequences that generated blunt ends by Mlyl 
<220> 

<221> misc_f eature 

<222> (1) . . (44) 

<223> N is any nucleotide. 



<400> 96 

nnnnnnnnnn nnnnnnnnnn nnnngagtcn nnnnnnnnnn nnnn 



<210> 97 

<211> 26 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An example of 3' 3 base overhangs generated by Mlyl cleava 
<220> 

<221> misc_feature 

<222> (1) . . (26) 

<223> N is any nucleotide. 



<400> 97 

nnnnnnnnnn nnnnnngagt cnnnnn 



<210> 98 

<211> 30 

<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Gene A from PHIX174 
<400> 98 

ctacgcgatt tcatagtgga ggcctccagc 



<210> 99 

<211> 28 

<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Gene A from PHIX174 
<400> 99 

ggtcgaattt tctcattttc cgccagca 



27 
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<210> 100 

<211> 10 

<212> DNA 

<213> Artificial Sequence 



<220> 
<223> 



1 starting fragment, position 1 



<400> 100 
aaaaaaaaaa 



10 



<210> 101 

<211> 11 

<212> DNA 

<213> Artificial Sequence 



<220> 
<223> 



1 starting fragment, position 2 



<400> 101 
tttttttttt t 



11 



<210> 102 

<211> 13 

<212> DNA 

<213> Artificial Sequence 



<220> 
<223> 



Fragment 1, position 1.2 



<400> 102 
aaaaaaaaaa aaa 



13 



<210> 103 

<211> 14 

<212> DNA 

<213> Artificial Sequence 



<220> 
<223> 



Fragment 0, position 1.3 



<400> 103 
aacggggggg gaaa 



14 



<210> 104 

<211> 14 

<212> DNA 

<213> Artificial Sequence 



<220> 
<223> 



Fragment 0, position 8.3 



28 
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<400> 104 
cttccccccc cccg 



14 



<210> 105 

<211> 13 

<212> DNA 

<213> Artificial Sequence 



<220> 
<223> 



Fragment 1, position 8.3 



<400> 105 
cttttttttt teg 



13 



29 
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